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METHODS AND COMPOSITIONS FOR 
IDENTIFYING OSTEOGENIC AGENTS 

Technical Fje] $ 

The present invention relates to assay techniques for identifying agents which 
modulate bone growth. 

Background of the Invention 

Although there is a great deal of information available on the factors which 
influence the breakdown and resorption of bone, information on growth factors which 
stimulate the formation on growth factors which stimulate the formation of new bone is 
more limited. Investigators have searched for sources of such activities and have found 
that bone tissue itself is a storehouse for factors which have the capacity for stimulating 
bone cells. Thus, extracts of bovine tissue obtained from slaughterhouses contain not only 
structural proteins which are responsible for maintaining the structural integrity of bone, 
but also biologically active bone growth factors which can stimulate bone cells to 
proliferate. Among these latter factors are transforming growth factor p, the heparin- 
binding growth factors (acidic and basic fibroblast growth factor), the insulin-like growth 
factors (insulin-like growth factor I and insulin-like growth factor II) and a recently 
described family of proteins called bone morphogenetic proteins (BMPs). All of these 
growth factors have effects on other types of cells as well as on bone cells. 

The BMPs are novel factors in the extended transforming growth factor P family. 
They were first identified in extracts of demineralized bone (Urist 1965, Wozney et a/., 
1988). Recombinant BMP-2 and BMP-4 can induce new bone formation when they are 
injected locally into the subcutaneous tissues of rats (Wozney 1992, Wozney & Rosen 
1993). These factors are expressed by normal osteoblasts as they differentiate, and have 
been shown to stimulate osteoblast differentiation and bone nodule formation in vitro as 
well as bone formation in vivo (Harris et a/. t 1994). This latter property suggests potential 
usefulness as therapeutic agents in diseases which result in bone loss. 

The cells which are responsible for forming bone are osteoblasts. As osteoblasts 
differentiate from precursors to mature bone-forming cells, they express and secrete a 
number of the structural proteins of the bone matrix including Type-1 collagen, osteocalcin, 
osteopontin and alkaline phosphates (Stein et aL f 1990, Harris et aL 9 1994). They also 
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synthcsize a number of growth regulatory peptides which are stored in the bone matrix and 
are presumably responsible for normal bone formation. These growth regulatory peptides 
include the BMPs (Harris etal, 1994). In studies of primary cultures of fetal rat calvarial 
osteoblasts, BMPs 1, 2, 3, 4, and 6 are expressed by cultured cells prior to the formation of 
5 mineralized bone nodules (Harris et al t 1994). Expression of the BMPs coincides with 
expression of alkaline phosphatase, osteocalcin and osteopontin. 

Although the BMPs have powerful effects to stimulate bone formation in vitro and 
in vivo, there are disadvantages to their use as therapeutic agents to enhance bone healing. 
Receptors for the bone morphogenetic proteins have been identified in many tissues, and 
1 0 the BMPs themselves are expressed in a large variety of tissues in specific temporal and 
spatial patterns. This suggests that they may have effects on many tissues other than bone, 
potentially limiting their usefulness a therapeutic agents when administered systematically. 
Moreover, since they are peptides, they would have to be administered by injection. These 
disadvantages are severe limitations to the development of BMPs as therapeutic agents. 

15 It is an object of the present invention to overcome the limitations inherent in 

known osteogenic agents by providing a method to identify potential drugs which would 
stimulate production of BMPs locally in bone. 

Prior Art 

Sequence data on small fragments of the S'-flanking region of the BMP-4 gene have 
20 been published (Chen et al, 1 993; Kurihara et al, 1993), but the promoter has not been 
previously functionally identified or isolated. 

Disclosure of the Invention 

A cell-based assay technique for identifying and evaluating compounds which 
stimulate the growth of bone is provided, comprising culturing a host cell line comprising 

25 an expression vector comprising a DNA sequence encoding a promoter region of at least 
one bone morphogenetic protein, operatively linked to a reporter gene encoding an 
assayable product under conditions which permit expression of said assayable product, 
contacting the cultured cell line with at least one compound suspected of possessing 
osteogenic activity, and identifying osteogenic agents by their ability to modulate the 

30 expression of the reporter gene and thereby increase the production of the assayable 
product. 
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This assay technique specifically identifies osteogenic agents which stimulate bone 
cells to produce bone growth factors in the bone morphogenetic protein family. These 
osteogenic agents display the capacity to increase the activity of the promoters of genes of 
members of the BMP family and other bone growth factors normally produced by e.g. bone 
cells. 

Also provided in accordance with the present invention are isolated DNA sequences 
encoding a promoter region of at least one bone morphogenetic protein, and a system for 
identifying osteogenic agents comprising an expression vector comprising such promoter 
sequences operatively linked to a reporter gene encoding an assayable product, and means 
for detecting the assayable product produced a response to exposure to an osteogenic 
compound. 

Brief Description of the Drawings 

Figure 1 A graphically depicts a restriction enzyme map of mouse genomic BMP-4 
and a diagram of two transcripts. The mouse BMP-4 gene transcription unit is -7kb and 
contains 2 coding exons (closed boxes) and 3 non-encoding exons, labeled exons 1 A, IB 
and 2. This 1 9kb clone has an -6kb 5 ' -flanking region and an -7kb 3 ' -flanking region. 
The diagram shows approximately 2.4kb of the 5' -flanking region, and a small region of 
the 3 ' -flanking region. The lower panel shows two alternative transcripts of BMP-4. 
Both have the same exons 2, 3 and 4 but a different exon 1. Transcript A has exon 1 A and 
transcript B has exon IB whose size was estimated according to RT-PCR and primer 
extension analysis in FRC cells; 

Figure IB depicts the DNA sequence of selected portions of mouse genomic BMP- 
4 (SEQ. ID NO. 1) and the predicted amino acid sequences of the identified coding exons 
(SEQ. ID NO. 2). The numbers on the right show the position of the nucleotide sequence 
and the bold numbers indicate the location of the amino acid sequence of the coding region. 
Most of the coding sequence is in exon 4. The end of the transcription unit was estimated 
based on a 1.8kb transcript. Primer 1 in exon 1A was used in RT-PCR analysis with Primer 
3 in exon 3. Primer 2 in exon IB was used in RT-PCR analysis with Primer 3. Primer Bl 
and B2 were used in primer extension reactions; 

Figure 1C portrays the sequence of the BMP-4 exon 1 A 5 '-flanking region and 
potential response elements in the mouse BMP-4 1 A promoter (SEQ. ID NO. 3). The 
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sequences of 2688 bp of the mouse BMP-4 gene are shown. Nucleotides are numbered on 
the left whh +1 corresponding to the major transcription start site of the 1 A promoter. The 
response elements of DR-1 A Proximal and DR-1 A Distal oligonucleotides are indicated. 
The other potential response DNA elements in the boxes are p53, RB (retinoblastoma), SP- 
5 1 , AP-1 , and AP-2. Primer A, indicated by the line above the DNA sequence at + 1 1 4 to 
+96, was used for primer extension analysis of exon lA-containing transcripts; 

Figure 2 depicts the results of a primer extension assay. Total RNAs prepared from 
FRC cells (on the left frame) and mouse embryo 9.5 days (on the right) were used with 
primer A or the complement of primer 2. Two major extended fragments, 67 and 1 1 5 bp, 
1 0 indicated a lane A were obtained from primer A Two 1 B primers, primer B 1 and primer 
B2, also gave negative results with both FRC and mouse embryo total RNA as template. 
Transcript B is not detectable with this assay. By RT-PCR, transcript B can be detected 
and quantified; 

Figure 3A is a photographic representation of gel electrophoresis of 1 A-3 and 1B-3 
1 5 RT-PCR products of the BMP-4 gene. RT-PCR was performed with two pairs of primers 
using FRC cell poly A + mRNA as the template. The products were verified by the DNA " 
sequence; 

Figure 3B is a schematic diagram of spliced BMP-4 RT-PCR products with 1 A and 
IB exons in FRC cells. RT-PCR was performed with two pairs of primers using FRC cell 
20 poly A" mRNA as the template. The diagram shows where the primers are located in the 
BMP-4 genomic DNA RT-PCR product 1A-2-3 which contains exon 1 A, exon 2 and the 
5' region of exon 3, was produced with primer 1 and primer 3. Primer 2 and primer 3 
generated two RT-PCR products with the exon 1B-2-3 pattern. The heterogeneity in size 
of exon IB is indicated. The 1 A promoter is predominantly utilized in bone cells; 

25 Figure 4 A provides a map of the BMP-4 1 A 5 ' -flanking-CAT plasmid and 

promoter activity in FRC cells. The 2.6kb EcoRl and Xba fragment, 1.3 kb Pst fragment, 
0.5kb Sph I and Pst fragment, and 0.25kb PGR fragment were inserted into pBLCAT3. 
The closed box indicates the non-coding exon 1 A The CAT box represents the CAT 
reporter gene, the values represent percentages of CAT activity expressed by pCAT-2.6 

30 set at 100%. The values represent the average of four independent assays; 



SUBSTITUTE SHEET (RULE 26) 



WO 96738590 PCT/US96/08197 

-5- 



Figure 4B provides an autoradiogram of CAT assays using FRC cells transfected 
with BMP-4 1 A 5 '-flanking-CAT plasmids identified in Figure 4A; 

Figure 5 portrays the nucleotid sequence of the mouse BMP-2 gene 5' -flanking 
region from -2736 to +139 (SEQ. ID NO. 4). The transcription start site is denoted by +1; 

5 Figure 6A depicts an autoradiogram showing products of a primer extension assay 

for determination of the transcription start site of the BMP2 gene, separated on a 8% 
denaturing urea-polyacrylamide gel, in which Lane 1 : Total UNA from fetal rat calvarial 
osteoblast cells, and Lane 2: Control lane with lOjig of yeast tRNA All RNA samples 
were primed with a 32 p-labeled oligonucleotide from exon 1 to the mouser BMP2 gene, as 
10 indicated in Figure 6B. LaneM: 32 p-labeled Msgl digested X phage DNA, containing 
DNA fragments spanning from 623 bp to 15 bp (size marker); 

Figure 6B provides a schematic representation of the primer extension assay. The 
primer used is a 18raer synthetic oligonucleotide, 5 '-CCCGGCAAGTTCAAGAAG-3 ' 
(SEQ.ro NO. 5); 



1 5 Figure 7 provides a diagram of selected BMP-2 promoter - luciferase reporter 

constructs. BMP-2 5 ' -flanking sequences are designated by hatched boxes (□) and 
luciferase cDNA is designated by the filled box (I). Base +1 14 denotes the 3' end of the 
BMP-2 gene in all the constructs; 

Figure 8 displays the luciferase enzyme activity for the BMP-2 gene-LUC 
20 constructs (shown in Figure 7) transfected in primary fetal rat calvarial osteoblasts (A), 
HeLa cells (B) and ROS 17/2.8 osteoblasts (C). The luciferase activity has been 
normalized to P-galactosidase activity in the cell lysates; 

Figure 9A-F depicts the DNA sequence of the mouse BMP-2 promoter and gene 
(SEQ. ID NO. 6); and 

25 Figure 10A-D depicts the DNA sequence of the mouse BMP-4 promoter and gene 

(SEQ. ID NO. 7). 

Figure 1 1 depicts the resequencing of the BMP-2 5' flanking region. 
Detailed Description of the Preferred Embodiments 
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A cell-based assay technique for identifying and evaluating compounds which 
stimulate the growth of bone is provided, comprising culturing a host cell line comprising 
an expression vector comprising a DNA sequence encoding a promoter region of at least 
one bone morphogenetic protein operatively linked to a reporter gene encoding an 
5 assayable product under conditions which permit expression of said assayable product, 
contacting the cultured cell line with at least one compound suspected of possessing 
osteogenic activity, and identifying osteogenic agents by their ability to modulate the 
expression of the reporter gene and thereby increase the production of the assayable 
product. 

10 The present invention is distinguished from other techniques for identifying bone- 

active compounds, as it specifically identifies chemical compounds, agents, factors or other 
substances which stimulate bone cells to produce the bone growth factors in the bone 
morphogenetic protein (BMP) family (hereinafter "osteogenic agents"). These osteogenic 
agents are identified by their capacity to increase the activity of the promoters of genes of 

15 members of the BMP family and other bone growth factors which are normally produced 
by bone cells, and other cells including cartilage cells, tumor cells and prostatic cells. When 
patients are treated with such chemical compounds, the relevant BMP will be produced by 
bone cells and then be available locally in bone to enhance bone growth or bone healing. 
Such compounds identified by this assay technique will be used for the treatment of 

20 osteoporosis, segmental bone defects, fracture repair, prosthesis fixation or any disease 
associated with bone loss. 

Compounds that inhibit bone morphogenetic protein expression in bone or cartilage 
may also be useful in clinical situations of excess bone formation which occurs in such 
diseases as osteoblastic metastases or osteosclerosis of any cause. Such compounds can 
25 also be identified in accordance with the present invention. 

Also provided in accordance with the present invention are isolated DNA sequences 
encoding a promoter region of at least one bone morphogenetic protein, and a system for 
identifying osteogenic agents comprising an expression vector comprising such promoter 
sequences operatively linked to a reporter gene encoding an assayable product, and means 
30 for detecting the assayable product produced in response to exposure to an osteogenic 
compound. 
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The promoters of the genes for BMP -4 and BMP-2 are complex promoters which 
can be linked to reporter genes, such as e.g. the firefly luciferase gene. When the hybrid 
genes (for example, bone cell BMP-4 promoter or bone cell BMP-2 promoter and firefly 
luciferases, chloramphenicol acetyl transferase (CAT) cDNAs, or cDNA's for other 
5 reporter genes such as P-galactosidase, green fluorescent protein, human growth hormone, 
alkaline phosphatase, {J-glucuronidase, and the like) are transfected into bone cells, 
osteogenic agents which activate the BMP-4 or BMP-2 promoters can be identified by their 
capacity in vitro to increase luciferase activity in cell lysates after cell culture with the 
agent. 

10 Sequence data on small fragments of the 5 '-flanking region of the BMP-4 gene have 

been published (Chen et cd % 1993; Kurihara et al y 1993), but the promoter has not been 
previously identified or isolated, and methods for regulating transcription have not been 
shown. The present invention isolates the promoters for the BMP genes and utilizes these 
promoters in cultured bone cells so that agents could be identified which specifically 

1 5 increase BMP-2 or BMP-4 production locally in bone. Since h is known that the BMPs are 
produced by bone cells, a method for enhancing their production specifically in bone should 
avoid systemic toxicity. This benefit is obtained by utilizing the unique tissue specific 
promoters for the BMPs which are provided herein, and then using these gene promoters to 
identify agents which enhance their activity in bone cells. 

20 By utilizing the disclosure provided herein, other promoters can be obtained from 

additional bone morphogenetic proteins such as BMP -3, BMP-5, BMP-6, and BMP-7, to 
provide comparable benefits to the promoters herein specifically described. 

In addition, the present invention contemplates the use of promoters from additional 
growth factors in osteoblastic cells. Included are additional bone morphogenetic proteins, 
25 as well as fibroblast growth factors (e.g. FGF- 1 1 FGF-2, and FGF-7), transforming growth 
factors 3-1, P-2, and p-3, insulin-like growth factor- 1, insulin-like growth factor-2, 
platelet-derived growth fiictor, and the like. Such promoters will readily be utilized in the 
present invention to provide comparable benefits. 

The cells which can be utilized in the present invention include primary cultures of 
30 fetal rat calvarial osteoblasts, established bone cell lines available commercially (MC3T3-E1 
cells, MG-63 cells, U20S cells, UMR106 cells, ROS 17/2.8 cells, SaOS2 cells, and the like 
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as provided in the catal g from the American Type Culture Collection (ATCC)), and bone 
cell lines established from transgenic mice, as well as other cell lines capable of serving as 
hosts for the present vectors and systems. In addition, a number of tumor cell lines also 
express BMPs, including the.prostate cancer cell lines PC3, LNCAP, and DUI145, as well 
as the human cancer cell line HeLa. Thus, any of a number of cell lines will find use in the 
present invention and the choice of an appropriate cell line will be a matter of choice for a 
particular embodiment. 

The following examples serve to illustrate certain preferred embodiments and 
aspects of the present invention and are not to be construed as limiting the scope thereof 



EXPERIMENTAL 

In the experimental disclosure which follows, the following abbreviations apply: eq 
(equivalents); M (Molar); raM (millimolar); nM (micromolar); N (Normal); mol (moles); 
ramol (millimoles); fimol (micromoles); nmol (nanomoles); kg (kilograms); gm (grams); mg 
(milligrams); ng (micrograms); ng (nanograms); L (liters); ml (milliliters); fx! (microliters); 
vol (volumes); and *C (degrees Centigrade), 



Example 1: DESCRIPTION AND CHARACTERIZATION OF MURINE 
BMP-4 GENE PROMOTER 

(a) Library Screening, Cloning and Sequencing of Gene 

< A mouse genomic lambda fix II spleen library (Stratagene, La Jolla, CA) was 
screened with a mouse embryo BMP-4 cDNA kindly provided by Dr. B.L.M. Hogan 
(Vanderbilt University School of Medicine, Nashville, TO). The probe was labeled with 
[a- 32 P]dCTP using a random-primer labeling lot from Boehringer-Mannheim (Indianapolis, 
IN). Plaque lift filters were hybridized overnight in 6X SSC, 5X Denhardt's. 0.5% SDS 
containing 200ng/ml sonicated salmon sperm DNA, lOfig/ml Poly A and 10ng/ml t-RNA at 
68° C The filters were washed at 55° C for 20 min, twice in 2X SSC, 0. 1% SDS buffer, 
once in 0.5X SSC, 0. 1% SDS. The isolated phage DNA clones were analyzed according to 
standard procedures (Sambrook et al. t 1989). 

Fragments from positive clones were subcloned into pBluescrpt vectors 
(Stratagene, La Jolla, CA) and sequenced in both directions using the Sequenase 
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dideoxynucleotide chain termination sequencing kit (U.S. Biochemical Corp., Cleveland, 
OH). 

Three clones were isolated from 2x1 0 6 plaques of mouse spleen 129 genomic library 
using full length coding region mouse embryo BMP-4 cDNA probe (B. Hogan, Vanderbilt 
University, Nashville, TN). One 19kb clone contained 5 exons and ~6kb 5 '-flanking region 
and a ~7kb 3 ' -flanking region, as shown in Figure 1 A. The 7kb transcription unit and the 
5 '-flanking region of the mouse BMP-4 gene were sequenced (Figure 10). 

The nucleotide sequence of selected portions of mouse BMP-4 and the deduced 
amino acid sequence of the coding exons (408 residues; SEQ. ID NO. 2) is shown in Figure 
IB. Primers used in the RT-PCR experiments described below are indicated in this Figure. 

Figure 1C shows the DNA sequence of 2372bp of the 5' -flanking region and the 
candidate DNA response elements upstream of exon 1 A. Primers used in primer extensions 
are also shown in Figures IB and 1C. 

(b) Primer Extension Mapping of the Transcriptional Start-Site of the Mouse BMP-4 
Gene 

The transcriptional start-sites were mapped by primer extension using the synthetic 
oligonucleotide primer A 5 '-CGGATGCCGAACTCACCTA-3 ' (SEQ. ID NO. 8), 
corresponding to the complement of nucleotides +1 14 to +96 in the exon 1 A sequence and 
the oligonucleotide primer Bl 5 , -CTACAAACCCGAGAACAG-3 > (SEQ. ID NO. 9), 
corresponding to the complement of nucleotides +30 to +13 of the exon IB sequence. 
Total RNA from fetal rat calvarial (FRC) cells and 9.5 day mouse embryo (gift of B. 
Hogan, Vanderbilt University) was used with both primers. The primer extension assay 
was carried out using the primer extension kit from Promega (Madison, WI). The 
annealing reactions were, however, carried out at 60'C in a water bath for 1 hr. The 
products were then electrophoresed on 8% denaturing-urea polyacrylamide gels and 
autoradiographed. 

One additional oligonucleotide primer B2 5' -CCCGGC ACGAAAGGAGAC-3 
(SEQ. ID NO. 10), corresponding to the complement of nucleotide sequence +69 to +52 of 
exon IB, was also utilized in primer extension reactions with FRC and mouse embryo 
RNAs. 
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1 . Evidence for utilization of two alternate exon 1 sequences for the BMP-4 gene. 

Several BMP-4 cDNAs were sequenced from prostate cancer cell in PC-3 and from 
primary FRC cells. Four independent FRC cell BMP-4 cDNAs all contained exon 1 A 
However, the human prostate carcinoma cell line (PC-3) cDNA contained an apparently 
unique exon IB sequence spliced to exon 2 (Chem et al, 1993). A doubt-stranded 
oligonucleotide roble (70bp) to exon IB was synthesized based on the human PC-3 exon 
IB sequence. This exon IB probe was then used to identify the exon IB region in the 
mouse genomic BMP-4 clone. The candidate exon IB is 1696bp downstream from the 3* 
end of exon 1A. 

2. Primer extension analysis 

Primer extension analysis was performed to map the mouse BMP-4 gene 
transcription start sites. Primer A, an oligonucleotide from exon 1 A, was used and two 
oligonucleotides from exon IB. Total RNA was utilized both from mouse embryo and 
FRC cells. As shown in Figure 2, a major extended fragment from primer A was obtained 
in both mouse embryo and FRC cell total RNAs, which migrates at 1 15bp. The extended 
5 '-end of the 1 15bp fragment represents the major transcription start site for 1 A-containing 
transcripts. The site of this 5' non-coding exon 1 A is 306bp. A major extended fragment 
from the complement of primer Bi (exon IB) was not detected using both mouse embryo 
and FRC cell total RNAs. One other primer from exon IB also gave negative results, 
suggesting that in 9.5 day mouse embryo and FRC cells, the exon IB-containing transcripts 
were not detectable, which suggests that transcripts containing exon IB are less abundant 
in these cells and tissues than transcripts containing exon 1 A All primer extensions were 
carried out after annealing of primers at high stringency. Lower stringency annealing with 
IB primers gave extended products not associated with BMP-4 mRNA 

(c) BMP-4 Gene 5 ' Flanking Region for Exon 1 A and IB Transcripts. 

Four FRC BMP-4 cDNA were sequenced and found to contain exon i A sequences 
spliced to exon 2. The human U20S BMP-4 cDNA sequence also contains exon 1 A 
(Wozney et al, 1988). This suggests the BMP-4 gene sequences upstream or exon 1 A are 
used primarily in bone cells. 

Tatest whether the BMP-4 IB promoter is utilized at all in FRC cells, 
oligonucleotide primers were designed to ascertain whether spliced 1B-2-3 exon products 
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and 1A-2-3 exon (control) products could be obtained by more sensitive RT-PCR 
technique using FRC poly (A>RNA. The 3* primer was in exon 3 (Figure IB - Primer 3) 
and the 5* primers were either in exon 1A (primer 1) or exon IB (primer 2). 

The RT-PCR products were cloned and sequenced. A photograph and diagram of 
the products obtained are presented in Figure 3 A and B. Both 1A-2-3 and 1B-2-3 
products were obtained. The results indicate FRC osteoblasts produce transcripts with 
either 1 A exon or a IB exon, but not both. This suggests that the intron region between 
1 A and IB exons could contain regulatory response elements under certain conditions. Of 
the 1B-2-3 RT-PCR products obtained from FRC osteoblasts, two products were obtained 
with different 3 ' splice sites for the exon IB. By comparison with the genomic DNA, both 
3' ends of the two exon IBs have reasonable 5' splice consensus sequences, consistent with 
an alternate splicing pattern obtained for the 1B-2-3 RT-FCR products. Most importantly, 
no 1 A-1B-2-3 RT-PCR splice products of the BMP-4 gene were obtained. Thus, IB does 
not appear to be alternatively spliced 5 '-non-encoding exon. By quantitative RT-PCR, it 
was shown that 1 A transcripts are 10 to 15X more abundant in primary bone cells. 

The technique of performing RT-PCR will be described. First-strand cDNA was " 
synthesized from lug FRC cell poly (A*)-RNA with an 18mer dT primer using 
Superscript™ reverse transcriptase (Gibco BRL) in a total volume of 20jil. The cDNA 
was then used as a template for PGR with two sets of synthesized primers. As shown in 
Figure IB, primer 1 (5 '-GAAGGCAAGAGCGCGAGG-3) (SEQ. ID No. 11), 
corresponding to a 3 ' region of exon 1 A and primer 3 ( 5-CCGGTCTCAGGTATCA-3*) 
(SEQ. ID No. 12), corresponding to a 5' region of exon 3 were used to generate exon 1 A- 
2-3 spliced PCR product. Primer 2 (5 '-CAGGCGGAAAGCTGTTC-3 ') (SEQ. ID NO. 
13), corresponding to a 3 ' region (+2 to +18) of exon IB, and primer 3 were used to 
generate exon 1B-2-3 spliced PCR products. GeneAmp PCR kit was used according to the 
manufacturer's procedure (Perkin-Elmer/Cetus, Norwalk, CT). Each cycle consisted of a 
denaturation step (94°C for 1 min), an annealing step (59°C for 2 min) and an elongation 
step (72°C for 1 min). The PCR products were analysed by agarose gel electrophoresis for 
size determination. The products were subcloned into pCR II vector using TA cloning kit 
(InVitrogen, San Diego, CA). The inserts were sequenced in both directions with a 
sequencing kit from U.S. Biochemical (Cleveland, OH). 
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Northern analysis demonstrated that the single 1.8kb BMP-4 transcript detected in 
FRC cells during bone cell differentiation hybridizes to both a pure 1 A exon probe and a 2- 
4 exons probe. The ratio of the 1 A to 2-4 signal is constant through the changing levels of 
BMP-4 expression during differentiation. Using a IB exon probe no detectable 
5 hybridization to the BMP-4 exon 2-4 1 .8kb signal was observed. This again indicates that 
1 A containing transcripts predominate in bone cells, although IB transcripts can be . 
detected by the more sensitive PGR method. By quantitative PCR it was shown that 1 A 
transcripts are 10-1 5X more abundant than IB in FRC cells. 

(d) BMP-4 Promoter lAPlasmid Construction and Transfection, and Detection of 
1 0 Promoter Activity in Osteoblasts. 

Three BMP-4 1 A promoter/plasmids were constructed by excising fragments from 
the 5* flanking region of the mouse BMP-4 gene and cloning into pBL3CAT expression 
vectors (Luckow and Schutz, 1987). The pCAT-2.6 plasmid was the pBLCAT3 vector 
with a 2.6kb EcoRl and Xba I fragment (-2372/+2S8) of the BMP-4 gene. The pCAT-1 .3 

15 plasmid was similarly generated from a 1.3kb Pst fragment (-1144/+212). The pCAT-0.5 
plasmid was made from a 0.5kb SphI and Pst fragment (-260/+212). Both the pCAT-1.3 - 
and the pCAT-0.5 plasmids have 212bp of exon 1A non-coding region. An additional 
promoter/plasmid was created from a PCR amplified product, corresponding to the 240bp 
sequence between nucleotides -25 and +212, and referred to as the pCAT-0.24. The 

20 amplified fragment was first cloned into pCR II vector using TA cloning kit (InVitrogen, 
San Diego, CA) and then the fragment was released with Hind m and Xho I, and relegated 
into pBL3CAT. Correct orientation of all inserts with respect to the CAT vector was 
verified by DNA sequencing. 

The cells used for transient transfection studies were isolated from 19 day-old fetal 
25 rat calvariae by sequential digestion with trypsin and collagenase, as described by Bellows 
et al, (1986) and Harris et al, (1994). In brief; the calvarial bone were surgically removed 
and cleaned by washing in a minimal essential media (aMEM) containing 10% WV fetal 
calf serum (FCS) and antibiotics. The bones were minced with scissors and were 
transferred to 35mm tissue culture dish containing 5ml of sterile bacterial collagenase 
30 (0. 1%) and trypsin 1 (0.05%). This was then incubated at 37°C for 20 min. The cells 

released at this time were collected and immediately mixed with an equal volume of FCS to 
inactivate trypsin. This procedure is repeated 6 times to release cells at 20 min intervals. 
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Cells released from 3rd, 4th, 5th and 6th digestion (enriched for osteoblasts) were 
combined and the cells are collected by centrifugation at 40 Xg for 5 min. The cells were 
then plated in aMEM containing 10% FCS and antibiotics and were grown to confluency 
(2-3 days). At this stage the cells were plated for transfection in 60mm tissue culture 
5 dishes at a cell density of 5 x 1 0 3 cells per dish. These primary osteoblast cultures are 
capable of self-organizing into bone-like structure in prolonged cultures (Bellows et al, 
1986; Harris et al 9 1994). HeLa, ROS 1 7/2.8, and CV-1 cells were purchased from the 
ATCC. 

The isolated FRC cells, enriched for the osteoblast phenotype, were used as 
10 recipient cells for transient transfection assays. BMP-4 mRNA is modulated in these cells 
in a transient fashion during prolonged cultured (Harris et al, 1994b). The technique of 
electroporation was used for DNA transfection (Potter, 1988; van den HofFe/ al, 1992). 
After electroporation, the cells were divided into aliquots, replated in 100mm diameter 
culture dishes and cultured for 48 hours in modified Eagle's minimal essential media 
1 5 (MEM, GIBCO, Grand Island, NY) with 1 0% fetal calf serum (FCS). The extracts were 
assayed for CAT actively according to the method described by Gorman (1988) and CAT . 
activity was normalized by P-galactosidase assay according to the method of Rouet et al 
(1992). 

After 48 hrs of transfections with various BMP-4-CAT reporter gene plasmid 
constructs, the cells were harvested and the CAT activity was determined. As indicated in 
Figure 4A and 4B, pCAT-0.24 plasmid (-25/+212) has little CAT activity. This plasmid 
contains -25 to +212 of the 5' non-coding exon 1A and was 3-fold lower that the parent 
pBL3CAT plasmid. The pCAT-0.5 (-260/+212), pCAT-1,3 (-1144/+212), and pCAT-2.6 
(-2372/+2S8) showed progressive increasing CAT activity when transfected into FRC cells. 
These data are shown in Figure 4B. With pCAT-0.5 (-260/+212) there is a 10-fold 
increase in CAT activity relative to pCAT-0.24 (-25/+212). pCAT-1.3 (-1 144/+212) 
shows a further 6-fold increase and pCAT-2.6 (-2372/+2S8) shows further 2-fold change 
over pCAT-1 .3 (-1 144/+212). Thus the net increase in CAT activity between the pCAT- 
0.24 (+257/+212) and the pCAT-2.6 (-2372/+2S8) in FRC cells is approximately 100-fold. 

Example 2 : DESCRIPTION AND CHARACTERIZATION OF 
MURINE BMP-2 GENE PROMOTER 
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(a) Cloning of Mouse BMP-2 Genomic DNA. 

Genomic clones of the mouse BMP-2 gene were isolated in order to determine the 
transcriptional regulation of the BMP-2 gene in primary osteoblasts. 5 x 10 6 plaques were 
screened from a mouse genomic library, B6/CBA, (purchased from Stratagene, San Diego, 
CA) using BMP-2 cDNA as probe. The BMP-2 cDNA clone was isolated from a cDNA 
library of PC3 prostate cancer cells (Harris et al, 1 994). The human BMP-2 probe was a 
1 . 1 kb Smal fragment containing most of the coding region. 

The BmP-2 genomic clones were sequenced by dideoxy chain termination method 
(Sanger et al, 1977), using deoxyadenosine 5'-[a[ 35 S]thio] triphosphate and Sequenase 
(United States Biochemical, Cleveland, OH). All fragments were sequenced at least twice 
and overlaps were established using the appropriate oligonucleotie primer. Primers were 
prepared on an Applied Biosystems Model 392 DNA Synthesizer. Approximately 16kb of 
one of these BMP-2 clones was completely sequenced (Figure 9). Analysis of this 
sequence showed that the mouse BMP-2 gene contains one encoding and two coding exons 
(Feng et al, 1994). Analysis of the 5' flanking sequence showed that the BMP-2 gene does 
not contain typical TATA oar CAAT boxes. However, a number of putative response 
elements and transcription factor recognition sequences were identified upstream of exon 1 
(Figure 5). The 5 '-flanking region is GC rich with several SP-1, AP-1 P53, E-box, 
homeobox, and AP-2 candidate DNA binding elements. 

(b) Analysis of Transcription Start Site for BMP-2 Gene. 

The transcription start sites for the BMP-2 gene were identified using the primer 
extension technique. Primer extension was carried out as described (Hall et al., 1993). 
The primer used was a 32 p-labeled 18 mer oligonucleotide 5 '-CCCGGC AATTCAAGAAG- 
3 ' (SEQ. ED NO> 5). Total RNA obtained from primary fetal rat calvarial osteoblasts, was 
used for the primer extension. The results were shown in Figure 6. The major extension 
product was 68bp and was used to estimate the major transportation start site (+1, Figure 
5). These results were confirmed by Rnase protection assays. 

(c) Identification of BMP-2 Promoter and Enhancer 

Activity Using Luciferase (LUC) Reporter Gene Constructs. 

The BMP-2-LUC constructs (Figure 7) were designed to contain variable 5' 
boundaries from BMP-2 5 '-flanking sequences spanning the transcription start site (+1); 
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Each construct contained the 3* boundary at +1 14 9 in exon 1 (Figure 6). These constructs 
were individually transfected into primary cultures of fetal rat calvarial osteoblasts, ROS 
17/2.8 osteosarcoma cells, HeLa cells, and CV-1 cells by the calcium-phosphate 
precipitation technique and the promoter activity for each of these constructs was assayed 
24 hrs following transfection by measuring the luciferase enzyme activity for each 
individual cell lysate. The LUC (luciferase enzyme assay) technique is described below 
under (f). Plasmid psvPGal was co-transfected with each plasmid construct to normalize 
for the transfection efficiency in each sample. The experiments were repeated at least five 
times in independent fetal rat calvarial cultures, with each assay done in triplicate. The 
mean values from a representative experiment are shown in Figure 8. 

(d) Isolation of Primary Fetal Rat Calvarial Osteoblasts for Functional Studies 
ofBMP-2 Gene Promoter. 

The cells used for transient transfection studies were isolated from 19 day-old fetal 
rat calvariae by sequential digestion with trypsin and collagenase, as described by Bellow et 
al, (1986) and Harris et al. 9 (1994). In brief; the calvarial bone were surgically removed 
and cleaned by washing in a minimal essential media (aMEM) containing 10% V/V fetal 
calf serum (FCS) and antibiotics. The bones were minced with scissors and was transferred 
to 35 mm tissue culture dish containing 5 ml of sterile bacterial collagenase (0.1%) and 
trypsin (0.05%). This was then incubated at 37°C for 20 min. The cells released at this 
time were collected and immediately mixed with an equal volume of FCS to inactivate 
trypsin. This procedure was repeated 6 times to release cells at 20 min intervals. Cells 
released from 3rd, 4th, 5th and 6th digestion (enriched for osteoblasts) were combined and 
the cells were collected by centrifiigation at 400 g for 5 min. The cells were then plated in 
aMEM containing 10% FCS and antibiotics and were grown to confluency (2-3 days). At 
this stage the cells were plated for transfection in 60 mm tissue culture dishes at a cell 
density of 5 x 10 3 cells per dish. These primary osteoblst cultures are capable f mineralized 
bone in prolonged cultures (Bellows et al, 1986; Harris et al, 1994). HeLa, ROS 17/2.8, 
and CV-1 cells were purchased from the ATCC. 

(e) Transient Transfection Assay. 

For transient transfection assay, the primary osteoblast cells were plated at the 
above mentioned cell density 1 8-24 hrs prior to transfection. The transfection was carried 
out using a modified calcium-phosphate precipitation method (Graham & van der Eb 1973; 
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Frost & Williams 1978). The cells were incubated for 4 hrs. at 37°C with 500^1 of a 
calcium phosphate precipitate of plasmid DNA containing lOjig of reporter plasmid 
construct and 1 fig of pSVpGal (for normalization of transfection efficiency) in 0. 1 5M 
CaCl 2 and Hepes buffered saline (21mM Hepes, 13.5mM NaCl, 5mM KC1, 0.7mM 
5 Na2HP04, 5.5mM dextrose, pH 7.05-7. 1). After the 4 hr. incubation period of cells with 
precipitate, the cells were subjected to a 2 min treatment of 15% glycerol in aMEM, 
followed by addition of fresh aMEM containing insulin, transferrin and selenium (ITS) 
(Upstate Biotechnology Lake Placid, NY). The cells were harvested 24 hrs post 
transfection. 

1 0 (f) Luciferase and (3-galactosidase Assay. 

Cells lysates were prepared and luciferase enzyme assay was carried out using assay 
protocols and the assay kit from Promega (Madison, WI). Routinely 20jil of cell lysate 
was mixed with lOOjil of luciferase assay reagent (270fiM coenzyme A, 470|iM luciferin 
and 530fiM ATP) and the luciferase activity was measured for 10 sec in a TURNER 
1 5 TD-20e luminometer. The values were normalized with respect to the p-galactosidase 
enzyme activity, obtained for each experimental sample 

The P-galactosidase enzyme activity was measured in the cell lysate using a 96 well 
microther plate according to Rouet et al (1992). 10-20^1 cell lysate was added to 90- 
80^1 P-galactosidase reaction buffer containing 88mM phosphate buffer, PH 7.3, 1 ImM 
. 20 KCL, ImM MgCl 2f 55mM P mercaptoethanol, 4.4mM chlorophenol red 

P-D-galaaopyranoside (Boehringer-Mannheim Corp., Indianapolis, IN). The reaction 
mixture was incubated at 37°C for 30-60 min, depending on transfection efficiency, and the 
samples were read with an ELIS A plate reader at 600nm. 

(g) Plasmid Construction 

25 The luciferase basic plasmid (pGL basic) was the vector used for all constructs 

(purchased from Promega, Madison, WI). Different lengths of DNA fragments from the 
BraP-2 5 '-flanking region were cloned at the multiple cloning sites of this plasmid, which is 
upstream of the firefly luciferase cDNA The BMP -2 DNA fragments were isolated either 
by using available restriction enzyme sites (constructs -196/+1 14, -876/+1 14, -1995/+1 14, - 

30 2483/+1 14, and -2736/+1 14) or by polymerase chain reaction using specific oligonucleotide 
primers (constructs -23/+114, -123/+1 14 and +29/+114. 
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The minimal promoter activity for the BMP-2 gene was identified in the shortest 
construct containing 23bp upstream of the transcription start site (-23/+1 14). No luciferase 
activity was noted in the construct and did not include the transcription start site 
(+29/+ 1 14). Two other constructs containing increasing lengths of 5 ' sequences up to - 
196bp showed reproducible decreases in promoter activity in fetal rat calvarial osteoblasts 
and HeLa cells (Figure 8). The -876/+1 14 construct showed a 5-fold increase in activity in 
HeLa cells. The -1995/+1 14, -2483/+ 1 14 and -2736/+ 1 14 constructs showed decreased 
promoter activity when compared to the -876/+1 14 construct only in HeLa cells (Figure 8). 

In the primary fetal rat calvarial osteoblasts, the 2.6kb construct (-2483/+1 14) 
demonstrated a 2-3-fold increase in luciferase activity over that of the -1995/+1 14 
construct (Figure 8). These results suggest that one or more positive response regions are 
present between -196 and -1995 and that the DNA sequence between -1995 and -2483bp 
was other positive regulatory elements that could modulate BMP-2 transcription. The 
largest 2.9kb construct (-2836/+1 14) repeatedly demonstrated a 20-50% decrease in 
promoter activity compared to the -2483/+1 14 construct, in these primary fetal rat calvarial 
osteoblasts (Figure 8). 

In ROS 17/2.8 osteosarcoma cells, the BMP-2 promoter activity was consistently 
higher than either the primary fetal rat calvarial osteoblasts or HeLa cells (Figure 8). All of 
the deletion constructs showed similar promoter activity in ROS 17/2.8 osteosreoma cells. 
The transformed state in ROS 17/2.8 cells may be responsible for the marked expression of 
the BMP-2 gene. ROS 17/2.8 cells represent a well differentiated osteosreoma and they 
produce high levels of BMP-2 mRNA. They form tumors in nude mice with bone-like 
material in the tumor (Majeska et al t 1 978; Majeska et al 9 1980). 

(h) Specificity of the BMP-2 Promoter. 

To analyze the activity of the BMP-2 promoter in cell types not expressing BMP-2 
mRNA, BMP-2 promoter constructs were transfected into CV-1 cells (monkey kidney 
cells). The BMP-2 promoter activity was found to be very low for all constructs. This 
suggests that this region of the BMP-2 promoter is functional only in cells such as primary 
fetal rat calvarial osteoblasts, HeLa and ROS 17/2.8 that express endogenous BMP-2 
mRNA (Anderson & Coulter 1968). CV-1 cells do not express BMP-2 mRNA. The 
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BMP-2 promoter is likely active in other cell types that express BMP-2, such as prostate 
cells and chondrocytes, although regulation of transcription may be different in these cells. 

Example 3 : USE OF PLASMID CONSTRUCTS CONTAINING BMP 
PROMOTERS WITH REPORTER GENES TO IDENTIFY 
OSTEOGENIC AGENTS 

Plasmid constructs containing BMP promoters with reporter genes have been 
transfected into osteoblastic cells. The cells which have been utilized include primary 
cultures of fetal rat calvarial osteoblasts, cell lines obtained as gifts or commercially 
(MC3T3-E12 cells, MG-63 cells, U20S cells, UMR106 cells, ROS 17/2.8 cells, Sa)S2 
cells, and the like as provided in the catalog from the ATCC) and bone and cartilage cell 
lines established from transgenic mice. The bone cells are transfected transiently or stably 
with the plasmid constructs, exposed to the chemical compound, agent or factor to be 
tested for 48 hours, and then hiciferase or CAT activity is measure in the cell lysates. 

Regulation of expression of the growth fector is assessed by culturing bone cells in 
aMEM medium with 10% fetal calf serum and 1% penicillin/streptomycin and 1% 
glutamine. The cells are placed in microtiter plates at a cell density of 5x1 0 3 cells 
/100jil/well. The cells are allowed to adhere and then incubated at 37°C at 5% C0 2 for 24 
hours and then the media is removed and replaced with 50jxl aMEM and 4% fetal calf 
serum, 50^1 aliquots containing the compound or factor to be tested in 0. 1% BSA solution 
is added to each well. The final volume is 100p.l and the final serum concentration is 2% 
fetal calf serum. Recombinant rat BMP-2 expressed in Chinese hamster ovarian cells is 
used as a positive control. 

The treated cells are incubated at 37°C at 5% C0 2 for 48 hours. The media is then 
removed and the cells are rinsed 3 times with phosphate buffered saline (PBS). Excess 
PBS is removed from the wells and IOOfil of cell culture lysing reagent (Promega #E153A) 
is added to each well. After 10 minutes, lOjil of the cell lysate is added to a 96-well white 
luminometric plate (Dynatech Labs #07100) containing 100^1 hiciferase assay buffer with 
substrate (Promega #E152A). The hiciferase activity is read using a Dynatech ML2250 
automated 96-well luminometer. The data is expressed as either picdgrams of hiciferase 
activity per well or picograms of hiciferase per \xg protein. 
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Example 4 : DEMONSTRATION THAT BONE CELLS 

TRANSFECTED WITH BMP PROMOTERS CAN 
BE USED TO SCREEN FOR OSTEOGENIC AGENTS 

To demonstrate that the present invention is useful in evaluating potential 
osteogenic agents, a random array of chemical compounds from a chemical library obtained 
commercially was screened. It was found that approximately 1 in 100 such compounds 
screened produces a positive response in the present assay system compared with the 
positive control, recombinant BMP-2, which is known to enhance BMP-2 transcription. 
Compounds identified from the random library were subjected to detailed dose-response 
curves, to demonstrate that they enhance BMP messenger RNA expression, and that they 
enhance other biological effects in vitro, such as expression of structural proteins including 
osteocalcin, osteopontin and alkaline phosphase, and enhance bone nodule formation in 
prolonged primary cultures of calvarial rodent osteoblasts. 

Compounds identified in this way can be tested for their capacity to stimulate bone 
formation in vitro in mice. To demonstrate this, the compound can be injected locally into 
subcutaneous tissue over the calvarium of normal mice and then the bone changes are 
followed histologically. It has been found that certain compounds identified by the present 
invention stimulate the formation of new bone in this in vivo assay system. 

The effects of compounds are tested in ICR Swiss mice, aged 4-6 weeks and 
weighing 13-26g. The compound at 20mg/kg or vehicle along (100^1 of 5% DMSO and 
phosphate-buffered 0.9% saline) are injected three times daily for 7 days. The injections 
are given into the subcutaneous tissues overlying the right side of the calvaria of five mice 
in each treatment group in each experiment. 

Mice are killed by either inhalation on day 14, i.e. 7 days after the last injection of 
compound. After fixation in 10% phosphate-buffered formalin, the calvariae are examined. 
The occipital bone is removed by cutting immediately behind and parallel to the lambdoid 
suture, and the frontal bone is removed by cutting anterior to the coronal suture using a 
scalpel blade. The bones are then bisected through the coronal plane and the 3- to 4mm 
strips of bone are decalcified in 14% EDTA, dehydrated in graded alcohols, and embedded 
in paraffin. Four 3fim thick nonconsecutive step sections are cut from each specimen and 
stained using hematoxylin and eosin. 

Two representative sections from the posterior calvarial strips are used. 
Histological measurements are carried out using a digitizing tablet and the Osteomeasure 
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image analysis system (Osteometries Inc., Atlanta, GA) on the injected and noninjected 
sides of the calvariae in a standard length of bone between the sagittal suture and the 
muscle insertion of the lateral border of each bone. Measurements consist of (1) Total 
bone area (i.e., bone and marrow between inner and outer periosteal surfaces); (2) Area of 
5 new woven bone formed on the outer calvarial surface; (3) The extent of osteoblast lined 
surface on the outer calvarial surface; (4) The area of the outer periosteum; and (5) The 
length of calvarial surface. From these measurements, the mean width of new bone and 
periosteum and the percentage of surface lined by osteoblasts on the outer calvarial surface, 
can be determined. 

By reference to the above disclosure and examples, it is seen that the present 
invention provides a new cell-based assay for identifying and evaluating compounds which 
stimulate the growth of bone. Also provided in accordance with the present invention are 
promoter regions of bone morphogenetic protein genes, and a system for identifying 
osteogenic agents utilizing such promoters operatively linked to reporter genes in 
expression vectors. 

The present invention provides the means to specifically identify osteogenic agents 
which stimulate bone cells to produce bone growth factors in the bone morphogenetic 
protein family. These osteogenic agents are shown to be useful to increase the activity of 
the promoters of genes of members of the BMP family and other bone growth factors 
normally produced by bone cells. 



Examples : RESEQUENCES^ OF THE BMP-2 5 'FLANKING REGION 

The BMP-2 5 ' flanking region described in Example 2 was resequenced. The 
nucleotide sequence of the 5' flanking region of the mouse BMP-2 gene is provided in 
Figure 11. The sequence information in Figure 1 1 corrects sequencing errors that are 
present in Figures 5 and 9. The nucleotide sequence of Figure 1 1 replaces bases -2736 to 
+119 provided in Figure 5 and bases 1 to 2855 provided in Figure 9. The non-nucleotide 
sequence information provided in Figure 5 is applicable to the corresponding bases in 
Figure 1 1 where such bases are present. 
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All publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application are [is] 
specifically and individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way of 
5 illustration and example for purposes of clarity and understanding, it will be apparent to 
those of ordinary skill in the art in light of the teaching of this invention that certain changes 
and modifications may be made thereto without departing from the spirit or scope of the 
appended claims. 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2310 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 768.. 1991 
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(xi) SEQUEKCB DESCRIPTION: SEQ ID NO:l: 

GGGAGGAAGG GAAGAAAGAG AGGGAGGGAA AAGAGAAGGA AGGAGTAGAT GTGAGAGGGT 

GGTGCTGAGG GTGGGAAGGC AAGAGCGCGA GGCCTGGCCC GGAAGCTAGG TGAGTTCGGC 

ATCCGAGCTG AOAGACCCCA GCCTAAGACG CCTGCGCTGC AACCCAGCCT GAGTATCTGG 

TCTCCGTCCC TGATGGGATT CTCGTCTAAA CCGTCTTGGA GCCtGCAGCG ATCCAGTCTC 

TGGCCCTCGA CCAGGTTCAT TGCAGCTTTC TAGAGGTCCC CAGAAGCAGC TGCTGGCGAG 

CCCGCTTCTG CAGGAACCAA TGGTGAGCTC GAGTGCAGGC CGAAAGCTGT TCTCGGGTTT 

GTAGACGCTT GGGATCGCGC TTGGGGTCTC CTTTCGTGCC GGGTAGGAGT TGTAAAGCCT 

TTGCAACTCT GAGATCGTAA AAAAAATGTG ATGCGCTCTT TCTTTGGCGA CGCCTGTTTT 

GGAATCTGTC CGGAGTTAGA AGCTCAGACG TCCACCCCCC ACCCCCCGCC CACCCCCTCT 

GCCTTGAATG GCACCGCCGA CCGGTTTCTG AAGGATCTGC TTGGCTGGAG CGGACGCTGA 

GGTTGGCAGA CACGGTGTGG ATTTTAGGAG CCATTCCGTA GTGCCATTCG GAGCGACGCA 

CTGCCGCAGC TTCTCTGAGC CTTTCCAGCA AGTTTGTTCA AGATTGGCTC CCAAGAATCA 

TGGACTGTTA TTATGCCTTG TTTTCTGTCA GTGAGTCCAG AGACACC ATG ATT CCT 

Met He Pro 
1 

GGT AAC CGA ATG CTG ATG GTC GTT TTA TTA TGC CAA GTC CTG CTA GGA 
Gly Asn Arg Met Leu Met Val Val Leu Leu Cys Gin Val Leu Leu Gly 
5 10 15 

GGC GCG AGC CAT GCT AGT TTG ATA CCT GAG ACC GGG AAG AAA AAA GTC 
Gly Ala Ser His Ala Ser Leu He Pro Glu Thr Gly Lys Lys Lya Val 
20 25 30 35 

GCC GAG ATT CAG GGC CAC GCG GGA GGA CGC CGC TCA GGG CAG AGC CAT 
Ala Glu He Gin Gly His Ala Gly Gly Arg Arg Ser Gly Gin Ser His 

40 45 so 

GAG CTC CTG CGG GAC TTC GAG GCG ACA CTT CTA CAG ATG TTT GGG CTG 
Glu Leu Leu Arg Aap Phe Glu Ala Thr Leu Leu Gin Met Phe Gly Leu 
55 60 6S 

CGC CGC CGT CCG CAG CCT AGC AAG AGC GCC GTC ATT CCG GAT TAC ATG 
Arg Arg Arg Pro Gin Pro Ser Lys Ser Ala Val He Pro Asp Tyr Met 
70 75 80 

AGO GAT CTT TAC CGG CTC CAG TCT GGG GAG GAG GAG GAG GAA GAG CAG 
Arg Asp Leu Tyr Arg Leu Gin Ser Gly Glu Glu Glu Glu Glu Glu Gin 
85 90 95 



60 
120 
180 
240 
300 
360 
420 
460 
540 
600 
660 
720 
776 

824 

872 

920 

968 

1016 

1064 
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AGC CAG GGA ACC GGG CTT GAG TAC CCG GAG CGT CCC GCC AGC CGA GCC 1112 
Ser Gin Gly Thr Gly Leu Glu Tyr Pro Glu Arg Pro Ala Ser Arg Ala 
100 105 110 lis 

AAC ACT GTG AGG AGT TTC CAT CAC GAA GAA CAT CTG GAG AAC ATC CCA 1160 
Asn Thr Val Arg Ser Phe His His Glu Glu His Leu Glu Asn He Pro 

120 125 130 

GGG ACC AGT GAG AGC TCT GCT TTT CGT TTC CTC TTC AAC CTC AGC AGC 1208 
Gly Thr Ser Glu Ser Ser Ala Phe Arg Phe Leu Phe Asn Leu Ser Ser 
135 140 145 

ATC CCA GAA AAT GAG GTG ATC TCC TCG GCA GAG CTC CGG CTC TTT CGG 1256 
He Pro Glu Asn Glu Val He Ser Ser Ala Glu Leu Arg Leu Phe Arg 
150 155 160 

GAG CAG GTG GAC CAG GGC CCT GAC TGG GAA CAG GGC TTC CAC CGT ATA 1304 
Glu Gin Val Asp Gin Gly Pro Asp Trp Glu Gin Gly Phe His Arg He 
165 170 175 

AAC ATT TAT GAG GTT ATG AAG CCC CCA GCA GAA ATG GTT CCT GGA CAC 1352 
Asn He Tyr Glu Val Met Lys Pro Pro Ala Glu Met Val Pro Gly His 
180 185 190 195 

CTC ATC ACA CGA CTA CTG GAC ACC AGA CTA GTC CAT CAC AAT GTG ACA 1400 
Leu He Thr Arg Leu Leu Asp Thr Arg Leu Val His His Asn Val Thr 

200 205 210 

CGG TGG GAA ACT TTC GAT GTG AGC CCT GCA GTC CTT CGC TGG ACC CGG 1448 
Arg Trp Glu Thr Phe Asp Val Ser Pro Ala Val Leu Arg Trp Thr Arg 
215 220 225 

GAA AAG CAA CCC AAT TAT GGG CTG GCC ATT GAG GTG ACT CAC CTC CAC 1496 
Glu Lys Gin Pro Asn Tyr Gly Leu Ala He Glu Val Thr His Leu His 
230 235 240 

CAG ACA CGG ACC CAC CAG GGC CAG CAT GTC AGA ATC AGC CGA TCG TTA 1544 
Gin Thr Arg Thr His Gin Gly Gin His Val Arg He Ser Arg Ser Leu 
245 250 255 

CCT CAA GGG AGT GGA GAT TGG GCC CAA CTC CGC CCC CTC CTG GTC ACT 1592 
Pro Gin Gly Ser Gly Asp Trp Ala Gin Leu Arg Pro Leu Leu Val Thr 
260 265 270 275 . 

TTT GGC CAT GAT GGC CGG GGC CAT ACC TTG ACC CGC AGG AGG GCC AAA 1640 
Phe Gly His Asp Gly Arg Gly His Thr Leu Thr Arg Arg Arg Ala Lys 

2B0 285 290 

CGT AGT CCC AAG CAT CAC CCA CAG CGG TCC AGG AAG AAG AAT AAG AAC 1688 
Arg Ser Pro Lys His His Pro Gin Arg Ser Arg Lys Lys Asn Lys Asn 
295 300 305 

TGC CGT CGC CAT TCA CTA TAC GTG GAC TTC AGT GAC GTG GGC TGG AAT 1736 
Cys Arg Arg His Ser Leu Tyr Val Asp Phe Ser Asp Val Gly Trp Asn 
310 315 320 
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GAT TGG ATT GTG GCC CCA CCC GGC TAC CAG GCC TTC TAC TGC CAT GGG 1784 
Asp Trp lie Val Ala Pro Pro Gly Tyr Gin Ala Phe Tyr Cys His Gly 
325 330 335 

GAC TGT CCC TTT CCA CTG GCT GAT CAC CTC AAC TCA ACC AAC CAT GCC 1832 
Asp Cys Pro Phe Pro Leu Ala Asp His Leu Asn Scr Thr Asn His Ala 
340 345 350 355 

ATT GTG CAG ACC CTA GTC AAC TCT GTT AAT TCT AGT ATC CCT AAG GCC 18B0 
He Val Gin Thr Leu Val Asn Ser Val Asn Ser Ser lie Pro Lys Ala 

360 365 370 

TGT TGT GTC CCC ACT GAA CTG AGT GCC ATT TCC ATG TTG TAC CTG GAT 1928 
Cys Cys Val Pro Thr Glu Leu Ser Ala He Ser Met Leu Tyr Leu Asp 
375 380 385 

GAG TAT GAC AAG GTG GTG TTG AAA AAT TAT CAG GAG ATG GTG GTA GAG 1976 
Glu Tyr Asp Lys Val Val Leu Lys Asn Tyr Gin Glu Met Val Val Glu 
390 395 400 

GGG TGT GGA TGC CGC TGAOATCAGA CAGTCCGGAG GGCGGACACA CACACACACA 2031 
Gly Cys Gly Cys Arg 
405 

CACACACACA CACACACACA CACACACACA CGTTCCCATT CAACCACCTA CACATACCAC 2091 

ACAAACTGCT TCCCTATAGC TGGACTTTTA TCTTAAAAAA AAAAAAAAGA AAGAAAGAAA 2151 

GAAAGAAAGA AAAAAAATGA AAGACAGAAA AGAAAAAAAA AACCCTAAAC AACTCACCTT 2211 

GACCTTATTT ATGACTTTAC GTGCAAATGT TTTGACCATA TTGATCATAT TTTGACAAAT 2271 

ATATTTATAA AACTACATAT TAAAAGAAAA TAAAATGAG 2310 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 408 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met He Pro Gly Asn Arg Met Leu Met Val Val Leu Leu Cys Gin Val 
1 5 10 15 

Leu Leu Gly Gly Ala Ser His Ala Ser Leu He Pro Glu Thr Gly Lys 
20 25 30 

Lys Lys Val Ala Glu He Gin Gly His Ala Gly Gly Arg Arg Ser Gly 
35 40 45 

Gin Ser His Glu Leu Leu Arg Asp Phe Glu Ala Thr Leu Leu Gin Met 
50 55 60 
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Pbs Gly Leu Arg Arg Arg Pro Gin Pro Ser Lys Ser Ala Val lie Pro 
65 70 . 75 80 

Asp Tyr Met Arg Asp Leu Tyr Arg Leu Gin Ser Gly Glu Glu Glu Glu 

85 90 95 

Glu Glu Gin Ser Gin Gly Thr Gly Leu Glu Tyr Pro Glu Arg Pro Ala 
100 105 no 

Ser Arg Ala Asn Thr Val Arg Ser Phe His His Glu Glu His Leu Glu 
115 120 125 

Asn lie Pro Gly Thr Ser Glu Ser Ser Ala Phe Arg Phe Leu Phe Asn 
130 135 140 

Leu Ser Ser lie Pro Glu Asn Glu Val lie Ser Ser Ala Glu Leu Arg 
145 150 155 160 

Leu Phe Arg Glu Gin Val Asp Gin Gly Pro Asp Trp Glu Gin Gly Phe 

165 170 175 

His Arg lie Asn He Tyr Glu Val Met Lys Pro Pro Ala Glu Met Val 
180 185 190 

Pro Gly His Leu He Thr Arg Leu Leu Asp Thr Arg Leu Val His His 
195 200 205 

Asn Val Thr Arg Trp Glu Thr Phe Asp Val Ser Pro Ala Val Leu Arg 
210 215 220 

Trp Thr Arg Glu Lys Gin Pro Asn Tyr Gly Leu Ala He Glu Val Thr 
225 230 235 240 

His Leu His Gin Thr Arg Thr His Gin Gly Gin His Val Arg He Ser 

245 250 255 

Arg Ser Leu Pro Gin Gly Ser Gly Asp Trp Ala Gin Leu Arg Pro Leu 
260 265 270 

Leu Val Thr Phe Gly His Asp Gly Arg Gly His Thr Leu Thr Arg Arg 
275 280 285 

Arg Ala Lys Arg Ser Pro Lys His His Pro Gin Arg Ser Arg Lys Lys 
290 295 300 

Asn Lys Asn Cys Arg Arg His Ser Leu Tyr Val Asp Phe Ser Asp Val 
305 310 315 320 

Gly Trp Asn Asp Trp He Val Ala Pro Pro Gly Tyr Gin Ala Phe Tyr 

325 330 335 

Cys His Gly Asp Cys Pro Phe Pro Leu Ala Asp His Leu Asn Ser Thr 
340 345 , * 350 



Asn His Ala He Val Gin Thr Leu Val Asn Ser Val Asn Ser Ser He 
355 360 365 
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Pro Lya Ala Cys Cys Val Pro Thr Glu Leu Ser Ala lie Ser Met Leu 
370 375 380 

Tyr Leu Asp Glu Tyr Asp Lys Val Val Leu Lys Asn Tyr Gin Glu Met 
385 390 395 400 

Val Val Glu Gly Cys Gly Cys Arg 

405 

(2) INFORMATION FOR SEQ ID NO:3: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2688 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE : DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

GAATTCGCTA GGTAGACCAG GCTGGCCCAG AACACCTAGA GATCATCTGG CTGCCTCTGT 60 

CTCTTGAGTT CTGGGGCTAA AGCATGCACC ACTCTACCTG GCTAGTTTGT ATCCATCTAA 120 

ATTGGGGAAG AAAGAAGTAC AGCTGTCCCC AGAGATAACA GCTGGGTTTT CCCATCAAAC 180 

ACCTAGAAAT CCATTTTAGA TTCTAAATAG GGTTTGTCAG GTAGCTTAAT TAGAACTTTC 240 

AGACTGGGTT TCACAGACTG GTTGGGCCAA AGGTCACTTT ATTGTCTGGG TTTCAGCAAA 300 

ATGAGACAAT AGCTGTTATT CAAACAACAT TTGGGTAAGG AAGAAAAATG AACAAACACC 360 

ACTCTCCCTC CCCCCGCTCC GTGCCTCCAA ATCCATTAAA GGCAAAGCTG CACCCCTAAG 420 

GACAACGAAT CGCTGCTGTT TGTGAGTTTA AATATTAAGG AACACATTGT GTTAATGATT 480 

GGAGCAGCAG TGATTGATGT AGTGGCATTG GTGAGCACTG AATCCGTCCT TCAACCTGCT 540 

ATGGGAGCAC AGAGCCTGAT GCCCCAGGAG TAATGTAATA GAGTAATGTA ATGTAATGGA 600 
GTTTTAATTT TGTGTTGTTG TTTTAAATAA TTAATTGTAA TTTTGGCTGT GTTAGAAGCT ■ 660 

GTGGGTACGT TTCTCAGTCA TCTTTTCGGT CTGGTGTTAT TGCCATACCT TGATTAATCG 720 

GAGATTAAAA GAGAAGGTGT ACTTAGAAAC GATTTCAAAT GAAAGAAGGT ATGTTTCCAA 780 

TGTGACTTCA CTAAAGTGAC AGTGACGCAG GGAATCAATC GTCTTCTAAT AGAAAGGGCT 840 

CATGGAGACC TGAGCTGAAT CTTTCTGTTC TGGATGAGAG AGGTGGTACC CATTGGAATG 900 

AAAGGACTTA GTCAGGGGCA ATACAGTGTG CTCCAAGGCT GGGGATGGTC AGGATGTTGT 960 

GCTCAGCCTC TAACACTCCT TCCAACCTGA CATTCCTTCT CACCCTTTGT CTCTGGCCAG 1020 

TAGAATACAG GAACTCGTTC CTGTTTTTTT TTTTTTAAAT TCTGAAGGTG TGTAAGTACA 1080 
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AAGGTCAGAT GAGCGGCCCT AGGTCAAGAC TGCTTTGTGG TGACAAGGGA GTATAACACC 114 0 

CACCCCAGAA ACCAAGAACC GGAAATTGCT ATCTTCCAGC CCTTTGAGAG CTACCTGAAG 1200 

CTCTGGGCTG CTGGCCTCAC CCCTTCCCTG CAGCTTTCCC TTTAGCAGAG GCTGTGATTT 1260 

CCTTCAGCGC TTGGGCAAAT ACTCTTAGCC TGGCTCACCT TCCCCATCCT CGTTTGTAAA 1320 

AACAAAGATG AAGCTGATAG TTCCTTCCCA GCTCCATCAG AGGCAGGGTG TGAAATTAGC 1380 

TCCTGTTTGG GAAGGTTTAA AAGCCGGCCA CATTCCACCT CCCAGCTAGC ATGATTACCA 1440 

ACTCTTGTTT CTTACTGTTG TTATGAAAGA CTCAATTCCT CATCTCCCTT TCCCTTCTTT 1500 

TAAAAAGGGG CCAAAGGGCA CTTTGTTTTT TTCTCTACAT GGCCTAAAAG GCACTGTGTT 1560 

ACCTTCCTGG AAGGTCCCAA ACAAACAAAC AAACAAACAA AATAACCATC TGGCAGTTAA 1620 

GAAGGCTTCA GAGATATAAA TAGGATTTTC TAATTGTCTT ACAAGGCCTA GGCTGTTTGC 1680 

CTGCCAAGTG CCTGCAAACT ACCTCTGTGC ACTTGAAATG TTAGACCTGG GGGATCGATG 1740 

GAGGGCACCC AGTTTAAGGG GGGTTGGTGC AATTCTCAAA TGTCCACAAG AAACATCTCA 1800 

CAAAAACTTT TTTGGGGGGA AAGTCACCTC CTAATAGTTG AAGAGGTATC TCCTTCGGGC 1860 

ACACAGCCCT GCTCACAGCC TGTTTCAACG TTTGGGAATC CTTTAACAGT TTACGGAAGG 1920 

CCACCCTTTA AACCAATCCA ACAGCTCCCT TCTCCATAAC CTGATTTTAG AGGTGTTTCA 1980 

TTATCTCTAA TTACTCGGGG TAAATGGTGA TTACTCAGTG TTTTAATCAT CAGTTTGGGC 2040 

AGCAGTTATT CTAAACTCAG GGAAGCCCAG ACTCCCATGG GTATTTTTGG AAGGTACAGA 2100 

GACTAGTTGG TGCATGCTTT CTAGTACCTC TTGCATGTGG TCCCCAGGTG AGCCCCGGCT 2160 

GCTTCCCGAG CTGGAGGCAT CGGTCCCAGC CAAGGTGGCA ACTGAGGGCT GGGGAGCTGT 2220 

GCAATCTTCC GGACCCGGCC TTGCCAGGCG AGGCGAGGCC CCGTGGCTGG ATGGGAGOAT 2280 

GTGGGCGGGG CTCCCCATCC CAGAAGGGGA GGCGATTAAG GGAGGAGGGA AGAAGGGAGG 2340 

GGCCGCTGGG GGGAAAGACT GGGGAGGAAG GGAAGAAAGA GAGGGAGGGA AAAGAGAAGG 2400 

AAGGAGTAGA TGTGAGAGGG TGGTGCTGAG GGTGGGAAGG CAAGAGCGCG AGGCCTGGCC 2460 

CGGAAGCTAG GTGAGTTCGG CATCCGAGCT GAGAGACCCC AGCCTAAGAC GCCTGCGCTG 2520 

CAACCCAGCC TGAGTATCTG GTCTCCGTCC CTGATGGOAT TCTCGTCTAA ACCGTCTTGG 2580 

AGCCTGCAGC GATCCAGTCT CTGGCCCTCG ACCAGGTTCA TTGCAGCTTT CTAGAGGTCC 2640 

CCAGAAGCAG CTGCTGGCGA GCCCGCTTCT GCAGGAACCA ATGGTGAG 2 688 
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(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2S7S base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GAATTCATTT AAGCTGGATT CACTTCTAGG TCCCATGCGT TTACACTCAT TTCCACCACA 
AGAGGGCAGC CATCTCTAAA AAAACAACAG TCGAGTGCTC TTCAGAGAAA TTGGGCCAAA 
CTTGAGGAAA GTTCCTGGGA AAGGCTTTTT AGCAGCACCT CTCTGGGCTA CAAAAAAGAA 



GCCAGCAGGC ACCACCAAGG TGGAGTAACT GTCCAGAGGC 
TTGATTACTA AGGATATCCT AAACGGCCAA ACTCTCTCTT 
AGCTGCAAGG CATTGTTGAT GTCATCACCA AAGGTTTCAT 
GGTCCAACAG CTGTCAGCTT TCTCTTCCTC ATTAAAGGCA 
TATAGGTTCG GAGTTTCTTG CTTTGCTCCT TCCGCCTCCG 
AACTTCTCAA TTAAACTTGA TAGGGAAGGA AATGGCTTCA 
CTTACACACT TACACGTCTG AGTGGAGTGT TTTATTGCCG 
TTCAGAGTGA CAACTTCTGC AACACGTTTT AAAAAGGAAT 
GCTGGATCTA TCCCTTCCTC TCCTTTAATT TCCCTTGTAG 
CCTTATTTGA CCTCTACAGC TCTAGAAACA GCCAGGGCCT 
TAATCCGATT TAGGTGAACG AACCTAGAGT TATTTTAGCT 
CACGTGGGTA AAAAAATCAT TAAAGCCCCT GCTTCTGGTC 
AAACTGGAAA GATCTGGTTC ACAACGTAAC GTTATCACTC 
CAGCCCATAG TTTTGGGGGT CCTGTGGGTA GCCAGTGGTG 
GTAGGGAGAA ATGGAAAGAT TCAAAAAAGA ATCCTGGCTC 
AGCTGAGGAA GAAAACTGGC TTGGCCACAG CCAGAGCCTT 
AGAGAGGACC AGGCAOAAAA TTCAAAGGTC TCAAACCGGA 
TGGAGTAGGT GGGTGTGGAA GGGAAGATAA ATATCACAAG 
TAAAGAGAAT TTCTATTAAC TCTCATTGTC CCTCACATGG 



ATCCATTTTA CCTCAGAGAC 
CTGGTGTTCC AGAGGCCCAA 
TTTCATCTTT TCTTGGGGTT 
ACTTTCTCAT TTAAATCTCA 
CGATGACAGA AGCAATGGTT 
GAGGCGATCA GCCCTTTTGA 
CCTTGTTTGG TGTCTCATGA 
ACAGTAGCTG ATCGCAAATT 
ACAGCCTTCC TTCAAAAATA 
AATTTCCCTC TGTGGGTTGC 
AAAAGACTGA AAAGCTAGCA 
TTTCTCGGTC TTTGCTTTGC 
TGGTCTTCTA CAGGAATGCT 
GTACTATAAG GCTCCTGAAT 
AGCAGCTTGG GGACATTTCC 
CTGCTGGAGA CCCAGTGGAG 
ATTGTCTTGT TACCTGACTC 
TATCGAAGTG ATCGCTTCTA 



ACACACACAC ACACACACAC 



€0 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
7B0 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
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ACACACACAC ACACATCACT AGAAGOOATG TCACTTTACA AGTGTGTATC TATGTTCA6A 1380 

AACCTGTACC CGTATTTTTA TAATTTACAT AAATAAATAC ATATAAAATA TATGCATCTT 1440 

TTTATTAGAT TCATTTATTT GAATATAAAT GTATGAATAT TTATAAAATG TAATAATGCA 1500 

CTCAGATGTG TATCGGCTAT TTCTCGACAT TTTCTTCTCA CCATTCAAAA CAGAAGCGTT 1560 

TGCTCACATT TTTGCCAAAA TGTCTAATAA CTTGTAAGTT CTGT TC TTCT TTTTAATGTG 1620 

CTCTTACCTA AAAACTTCAA ACTCAAGTTG ATATTGGCCC AATGAGGGAA CTCAGAGGCC 1680 

AGTGGACTCT GGATTTGCCC TAGTCTCCCG CAGCTGTGGG CGCGGATCCA GGTCCCGGGG 1740 

GTCGGCTTCA CACTCATCCG GGACGCGACC CCTTAGCGGC CGCGCGCTCG CCCCGCCCCG 1800 

CTCCACCGCG GCCCCGTACG CGCCGTCCAC ACCCCTGCGC GCCCGTCCCG CCCGCCCGGG 1860 

GGATCCCGGC CGTGCTGCCT CCGAGGGGGA GGTGTTCGCC ACGGCCGGGA GGGAGCCGGC 1920 

AGGCGGCGTC TCCTTTAAAA GCCGCGAGCG CGCGCCAGCG CGGCTCGTCG CCGCCGGAGT 1980 

CCTCGCCCTG CCGCGCAOAG CCCTGCTCGC ACTGCGCCCG CCGCGTGCGC TTCCCACAGC 2040 

CCGCCCGGGA TTGGCAGCCC CGGACGTAGC CTCCCCAGGC GACACCAGGC ACCGGGACGC 2100 

CCTCCCGGCG AAAGACGCGA GGGTCACCCG CGGCTTCGAG GGACTGGCAC GACACGGGTT 2160 

GGAACTCCAG ACTGTGCGCG CCTGGCGCTG TGGCCTCGGC TGTCCGGGAG AAGCTAGAGT 2220 

CGCGGACCGA CGCTAAGAAC CGGGAGTCCG GAGCACAGTC TTACCCTCAA TGCGGGGCCA 2280 

CTCTGACCCA GGAGTGAGCG CCCAAGGCGA TCGGGCGGAA GAGTGAGTGG ACCCCAGGCT 2340 

GCCACAAAAG ACACTTGGCC CGAGGGCTCG GAGCGCGAGG TCACCCGGTT TGGCAACCCG 2400 

AOACGCGCGG CTGGACTGTC TCGAGAATGA GCCCCAGGAC GCCGGGGCGC CGCAGCCGTG 2460 

CGGGCTCTGC TGGCGAGCGC TGATGGGGGT GCGCCAGAGT CAGGCTGAGG GAGTGCAGAG 2520 

TGCGGCCCGC CCGCCACCCA AGATCTTCGC TGCGCCCTTG CCCGGACACG GCATCGCCCA 2580 

CGATGGCTGC CCCGAGCCAT GGGTCGCGGC CCACGTAACG CAGAACGTCC GTCCTCCGCC 2640 

CGGCGAGTCC CGGAGCCAGC CCCGCGCCCC GCCAGCGCTG GTCCCTGAGG CCGACGACAG 2700 

CAGCAGCCTT GCCTCAGCCT TCCCTTCCGT CCCGGCCCCG CACTCCTCCC CCTGCTCGAG 2760 

GCTGTGTGTC AGCACTTGGC TGGAGACTTC TTGAACTTGC CGGGAGAGTG ACTTGGGCTC 2820 

CCCACTTCGC GCCGGTGTCC TCGCCCGGCG GATCCAGTCT TGCCGCCTCC AGCCC 2875 
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(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCCGGCAAGT TCAAGAAG 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15144 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



18 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

GAATTCATTT AAGCTGGATT CACTTCTAGG TCCCATGCGT TTACACTCAT TTCCACCACA 60 

AGAGGGCAGC CATCTCTAAA AAAACAACAG TCGAGTGCTC TTCAGAGAAA TTGGGCCAAA 120 

CTTGAGGAAA GTTCCTGGGA AAGGCTTTTT AGCAGCACCT CTCTGGGCTA CAAAAAAGAA 180 

GCCAGCAGGC ACCACCAAGG TGGAGTAACT GTCCAGAGGC ATCCATTTTA CCTCAGAGAC 240 

TTGATTACTA AGGATATCCT AAACGGCCAA ACTCTCTCTT CTGGTGTTCC AGAGGCCCAA 300 

AGCTGCAAGG CATTGTTGAT GTCATCACCA AAGGTTTCAT TTTCATCTTT TCTTGGGGTT 360 

GGTCCAACAG CTGTCAGCTT TCTCTTCCTC ATTAAAGGCA ACTTTCTCAT TTAAATCTCA 420 

TATAGGTTCG GAGTTTCTTG CTTTGCTCCT TCCGCCTCCG CGATGACAGA AGCAATGGTT 480 

AACTTCTCAA TTAAACTTGA TAGGGAAGGA AATGGCTTCA GAGGCGATCA GCCCTTTTGA 540 

CTTACACACT TACACGTCTG AGTGGAGTGT TTTATTGCCG CCTTGTTTGG TGTCTCATGA 600 

TTCAGAGTGA CAACTTCTGC AACACGTTTT AAAAAGGAAT ACAGTAGCTG ATCGCAAATT €60 

GCTGGATCTA TCCCTTCCTC TCCTTTAATT TCCCTTGTAG ACAGCCTTCC TTCAAAAATA 720 

CCTTATTTGA CCTCTACAGC TCTAGAAACA GCCAGGGCCT AATTTCCCTC TGTGGGTTGC 780 

TAATCCGATT TAGGTGAACG AACCTAGAGT TATTTTAGCT AAAAGACTGA AAAGCTAGCA 840 
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CACGTGGGTA AAAAAATCAT TAAAGCCCCT GCTTCTGGTC TTTCTCGGTC TTTGCTTTGC 900 

AAACTGGAAA GATCTGGTTC ACAACGTAAC GTTATCACTC TGGTCTTCTA CAGGAATGCT 950 

CAGCCCATAG TTTTGGGGGT CCTGTGGGTA GCCAGTGGTG GTACTATAAG GCTCCTGAAT 1020 

GTAGGGAGAA ATGGAAAGAT TCAAAAAAGA ATCCTGGCTC AGCAGCTTGG GGACATTTCC 1080 

AGCTGAGGAA GAAAACTGGC TTGGCCACAG CCAGAGCCTT CTGCTGGAGA CCCAGTGGAG 1140 

AGAGAGOACC AGGCAGAAAA TTCAAAGGTC TCAAACCGGA ATTGTCTTGT TACCTGACTC 1200 

TGGAGTAGGT GGGTGTGGAA GGGAAGATAA ATATCACAAG TATCGAAGTG ATCGCTTCTA 1260 

TAAAGAGAAT TTCTATTAAC TCTCATTGTC CCTCACATGG ACACACACAC ACACACACAC 1320 

ACACACACAC ACACATCACT AGAAGGGATG TCACTTTACA AGTGTGTATC TATGTTCAGA 1380 

AACCTGTACC CGTATTTTTA TAATTTACAT AAATAAATAC ATATAAAATA TATGCATCTT 1440 

TTTATTAGAT TCATTTATTT GAATATAAAT GTATGAATAT TTATAAAATG TAATAATGCA 1500 

CTCAGATGTG TATCGGCTAT TTCTCGACAT TTTCTTCTCA CCATTCAAAA CAGAAGCGTT 1560 

TGCTCACATT TTTGCCAAAA TGTCTAATAA CTTGTAAGTT CTGTTCTTCT TTTTAATGTG 1620 

CTCTTACCTA AAAACTTCAA ACTCAAGTTG ATATTGGCCC AATGAGGGAA CTCAGAGGCC 1680 

AGTGGACTCT GGATTTGCCC TAGTCTCCCG CAGCTGTGGG CGCGGATCCA GGTCCCGGGG 1740 

GTCGGCTTCA CACTCATCCG GGACGCGACC CCTTAGCGGC CGCGCGCTCG CCCCGCCCCG 1800 

CTCCACCGCG GCCCCGTACG CGCCGTCCAC ACCCCTGCGC GCCCGTCCCG CCCGCCCGGG I860 

GGATCCCGGC CGTGCTGCCT CCGAGGGGGA GGTGTTCGCC ACGGCCGGGA GGGAGCCGGC 1920 

AGGCGGCGTC TCCTTTAAAA GCCGCGAGCG CGCGCCAGCG CGGCGTCGTC GCCGCCGGAG 1980 

•TCCTCGCCCT GCCGCGCAGA GCCCTGCTCG CACTGCGCCC GCCGCGTGCG CTTCCCACAG 2040 

CCCGCCCGGG ATTGGCAGCC CCGGACGTAG CCTCCCCAGG CGACACCAGG CACCGGAGCC 2100 

CCTCCCGGCG AAAGACGCGA GGGTCACCCG CGGCTTCGAG GGACTGGCAC GACACGGGTT 2160 

GGAACTCCAG ACTGTGCGCG CCTGGCGCTG TGGCCTCGGC TGTCCGGGAG AAGCTAGAGT 2220 

CGCGGACCGA CGCTAAGAAC CGGGAGTCCG GAGCACAGTC TTACCCTCAA TGCGGGGCCA 2280 

CTCTGACCCA GGAGTGAGCG CCCAAGGCGA TCGGGCGGAA GAGTGAGTGG ACCCCAGGCT 2340 

GCCACAAAAG ACACTTGGCC CGAGGGCTCG GAGCGCGAGG TCACCCGGTT TGGCAACCCG 2400 

AGACGCGCGG CTGOACTGTC TCGAGAATGA GCCCCAGGAC GCCGGGGCGC CGCAGCCGTG 2460 

CGGGCTCTGC TGGCGAGCGC TGATGGGGGT GCGCCAGAGT CAGGCTGAGG GAGTGCAGAG 2520 

TQCGGCCCGC CCGCCACCCA AGATCTTCGC TGCGCCCTTG CCCGGACACG GCATCGCCCA 2580 
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CGATGGCTGC CCCGAGCCAT GGGTCGCGGC CCACGTAACG CAGAACGTCC GTCCTCCGCC 264 0 
CGGCGAGTCC CGGAGCCAGC CCCGCGCCCC GCCAGCGCTG GTCCCTGAGG CCGACGACAG 2700 
CAGCAGCCTT GCCTCAGCCT TCCCTTCCGT CCCGGCCCCG CACTCCTCCC CCTGCTCGAG 2760 
GCTGTGTGTC AGCACTTGGC TGGAGACTTC TTGAACTTGC CGGGAGAGTG ACTTGGGCTC 2820 
CCCACTTCGC GCCGGTGTCC TCGCCCGGCG GATCCAGTCT TGCCGCCTCC AGCCCGATCA 2880 

CCTCTCTTCC TCAGCCCGCT GGCCCACCCC AAGACACAGT TCCCTACAGG GAGAACACCC 2940 

GGAGAAGGAG GAGGAGGCGA AGAAAAGCAA CAGAAGCCCA GTTGCTGCTC CAGGTCCCTC 3000 

GGACAGAGCT TTTTCCATGT GGAGACTCTC TCAATGGACG TGCCCCCTAG TGCTTCTTAG 3060 

ACGGACTGCG GTCTCCTAAA GGTAGAGGAC ACGGGCCGGG GACCCGGGGT TGGCTGGCGG 3120 

GTGACACCGC TTCCCGCCCA ACGCAGGGCG CCTGGGAGGA CTGGTGGAGT GGAGTGGACG 3180 

TAAACATACC CTCACCCGGT GCACGTGCAG CGGATCCCTA GAGGGGTTAG GCATTCCAAA 3240 

CCCCAGATCC CTCTGCCTTG CCCACTGGCC TCCTTCCTCC AGCCGGTTCC TCCTCCCCAA 3300 

GTTTTCGATA CATTATAAGG GCTGTTTTGG GCTTTCAAAA AAAAAAATGC AGAAATCCAT 3360 

TTAAGAGTAT GGCCAGTAGA TTTTACTAGT TCATTGCTGA CCAGTAAGTA CTCCAAGCCT 3420 

TAGAGATCCT TGGCTATCCT TAAGAAGTAG GTCCATTTAG GAAGATACTA AAAGTTGGGG 3480 

TTCTCCATGT GTGTTTACTG ACTATGCGAA TGTGTCATAG CTTACACGTG CATTCATAAA 3540 

CACTATCTAT TTAGTTAATT GCAGGAAGGT GCATGGATTT CTTGACTGCA CAGGAGTCTT 3600 

GGGGAAGGGG GAACAGGGTT GCCTGTGGGT CAACCTTAAA TAGTTAGGGC GAGGCCACAA 3660 

CTTGCAAOTG GCGTCATTAG CAGTAATCTT GAGTTTAGCG CTTACTGAAT CTACAAGTTT 3 720 

GATATGCTCA ACTACCAGGA AATTGTATAC AGCGCCTCTA AGGAAGTCAC TTGTGCATTT 3780 

GTGTCTGTTA AT ATG CACAT GAGGCTGCAC TGTATAAGTT TGTCAGGGAT GCAGTGTCCG 3840 

ACCAACCTAT GGCTTCCCAG CTTCCTGACA CCCGCATTCC CAGCTAGTGT CACAAGAAAA 3900 

GGGTACAGAC GGTCAAGCTC TTTTTAATTG GGAGTTAAOA CCAAGCCCCA AGTAAGAAGT 3960 

CCGGCTGGGA CTTGGGGGTC CTCCATCGGC CAGCGAGCTC TATGGGAGCC GAGGCGCGGG 4020 

GGCGGCGGAG GACTGGGCGG GGAACGTGGG TGACTCACGT CGGCCCTGTC CGCAGGTCGA 4080 

CCATGGTGGC CGGGACCCGC TGTCTTCTAG TGTTGCTGCT TCCCCAGGTC CTCCTGGGCG 4140 

GCGCGGCCGG CCTCATTCCA GAGCTGGGCC GCAAGAAGTT CGCCGCGGCA TCCAGCCGAC 4200 

CCTTGTCCCG GCCTTCGGAA GACGTCCTCA GCGAATTTGA GTTGAGGCTG CTCAGCATGT 4260 

TTGGCCTGAA GCAGAGACCC ACCCCCAGCA AGGACGTCGT GGTGCCCCCC TATATGCTAG 4320 
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ATCTGTACCG CAGGCACTCA GGCCAGCCAG GAGCGCCCGC CCCAGACCAC CGGCTGGAGA 4380 
GGGCAGCCAG CCGCGCCAAC ACCGTGCGCA CGTTCCATCA CGAAGGTGAG CGGGCGGCGG 4440 
GTGGCGGGGC GGGGACGGCG GGCGGGCGGA GACTAGGCGG GCAGCCCGGG CCTCCACTAG 4500 
CACAGTAGAA GGCCTTTCGG CTTCTGTACG GTCCCCTCTG TGGCCCCAGC CAGGGATTCC 4560 
CCG CTTGTGA GTCCTCACCC TTTCCTGGCA AGTAGCCAAA AGACAGGCTC CTCCCCCTAG 4620 
AACTGGAGGG AAATCGAGTG ATGGGGAAGA GGGTGAGAGA CTGACTAGCC CCTAGTCAGC 4680 
ACAGCATGCG AGATTTCCAC AGAAGGTAGA GAGTTGGAGC TCCTTAAATC TGCTTGGAAG 4 740 
CTCAGATCTG TGACTTGTGT TCACGCTGTA GTTTTAAGCT AGGCAGAGCA AGGGCAGAAT 4 BOO 

GTTCGGAGAT AGTATTAGCA AATCAAATCC AGGGCCTCAA AGCATTCAAA TTTACTGTTC 4 860 

ATCTGGGCCT AGTTTGAAAG ATTTCTGAAT CCCTATCTAA TCCCCGTGGG AGATCAATTC 4920 

CACAATTCGT CATATTGTTT CCACAATGAC CTTCGATTCT TTGCTTAAAT CTTAAATCTC 4980 

CAAGTGGAGA CAGCGCAACG CTTCAGATAA AAGCCTTTCT CCCACTGCCT GCTACCTTCC 5040 

TAGGCAAGGC AATGGGGTTT TTAAACAAAT ATATGAATAT GATTTCCCAA GATAGAATAA 5100 

TGTTGTTTAT TTCAGCTGAA ATTTCCTGGA TTAGAAAGGC TGTAGAGGCC TATTGAAGTC 5160 

TCTTGCACCG ATGTTCTGAA AGCAGTTAGT AAAAAATCAT GACCTAGCTC AATTCTGTGT 5220 

GTGCCACTTT CAATGTGCTT TTGACTTAAT GTATTCTCCA TAGAACATCA GTTCCTTCAA .5280 

GTTCTAGAAG AATTCAGATT TAAAGTTTTG CTTTGCCTTG CTGAGGGGAT AAATTTTAAG 5340 

TAGAAATCTA GGCTCTGAAA TGATAGCCCA ACCCCATCTC CAGTAAGGGA TGACTGACTC 5400 

AAACCTTGAG AAGTCTGGGT GATAATAGGA AAAGTCCACA AGCAGGTCAC AGAGCGCGAG 5460 

ATGGATCTGT CTTGAGGCAG CCAATGGTTA TGAAGGGCAC TGGAAATCCA TCTCTTTCAA 5520 

ACTGGTGTCT AGGGCTTTCT GGGAGCAAAG CTTAGACCAC ATTCTGCTCC TCAAGGTTTG 5580 

CCTACTGAAA GCAGGGAGAT TCTGGGTGTT CACCCCCATC CTTCACCCCC AGGTGATTCT 5640 

GGGCTTAGCT AATCTCTCCT GGTTAATATT CATTGGAAAG TTTTTATAGA TCAAAACAAA 5700 

CAAACCTACT ATCCAGCACA GGTGTTTTTC CCACTGCCTC TGGAGATATA GCAAGAAAAC 5760 

CATATATTCA TGTATTTCCT TATTAGTCTT TTCTAACGTG AAAATTATTC CTGACCTATA 5820 

AAAAATGAAG GAGGTATTTT ATCTTAACTA AGCTAAAAGA ATCGCTTAAG TCAATTGAAA 5880 

CTCAAAAATC CAATTGAATG AAAGGTTCGT CAATAAAAAT CTACATTTTT CTTACTCTTC 5940 

CTTTGGAAAT AGCTTGATAA AAACACAGAC AAAACAAAGT CTGTGTGCTT ATTTGAAAAC 6000 

TTAGTGAGCT TCAGTTCATA AGCAAAAAAT GTAGTTTAAA AGTGATTTTT CTGTTGTAAA 6060 
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ACGTGATAGA AGTTATTGAC TTGTTTAAAA TAAACTTGCA CTAACTTTAT ACCTTGGTGC 6120 

AATTAGATGT AATGTTTACT GTAAATTTCA GGAAAACCAT TIITI ' TITTI ' TGGTCATGAT 6180 

CAGGTACACA TGGCATTTGG GAAGACTTTT CACATTGTTG AGTAACCTAG AGTTTGTTTG 6240 

TTTGTTTGTT TGTTTTTAAG CATTCTTQTG CCACTAGAAA AACCTTAATA AGCCATGTGT 63 00 

TACTTGGTAG ACTTCTTCCT AAGTTCTAGA AAGTGGCTTA ATGCCACGAT GAGACAAAAC 6360 

ATACCATAGT AGTCTTTCAA CCAGTGGCAG AGTCTTCCAG ACAAAATCTC CTGTTGAACA 6420 

TTAAGACCAT -GGATTTTTAT CCAGGAGAGC CCAGGCTTTG CTGAATCACC ACCCTCCAAC 64 BO 

CCCACTCCAA GGTCACCGAA GGCCTCCCCA ACTGGCTGCC ATTGAGAAAC TGTTTGAAAT 6 540 

TGATTGACTC CATTGGCCCT ACAGAGACTT CTCCTTTAGT GGCAGATCAT ATACTGAAGG 6600 

ATCCAAGCTT GCTCTTCTGA CTATGAAGAG CACAGTCTTT C ' lTmaTl ' ATGGAATAAA 6660 

CAAACTATGT GGCCCTGTGA CTAAAGTTTT CAAAGAGGGA GAGATCCTGT TAGCAGAAGT 6720 

GCAACTGCCC AGAAACTAGC CACAGGCTAG GATATTCCAA AGTACAACTC TAAAGTATGG 6780 

TCCATCCTAA ATTCTAGCAT GGGGTTGAAT ACCGGCATCC AGGAATACTT CTCTCTACCT 6840 

CTGGCTATTG CAGTGAGATT ACGAAGACCC TGGGGGGAAA AACAGTTG CT TAGTTTACAG 6900 

ATGTTCCTTG CCACAGATGT TCTCAGTATC TCTTGTTTGT CAGAGGATCC TTTCAATCCC 6960 

TCTTGACATT TCCAATCTGC TTTTGTCCTC TCTACATGTG CCTTGTGGCA TTTCGCTTGG 7020 

TCTTTAGAGA ATCCCTTTCT GGAGCTGCAG GTTCCCTTGT AGGATCTGTG TTCAGGAGAA 7080 

CAGGGACCTT GGCAGGTTAG TGACAACTAC CAAACCCTGC TTTCCTTCCC TGCCACTTCC 7140 

TTTGTTGCCT TAAAAATTAA ACCTTAACTC TCTGTGTCTA AACCTTTTCT TCTTCCTCTT 72 00 

TGTCATTTAC TTTATTTATT TGTCATGTAC TTTATCCTGT AGAAAATCAC AGTGTGGCCC 7260 

AAAGCCCCTT GAATCTTGTT GCAGCGGTGA GATGCAGCTG CTGATCTGGA AT AG C CTT AG . 7320 

GCTGTGTGTT TGATCACAAT GCTTTCTGTC CAAAAGTGTG CAAATCCTCC AAGCTTAATG 7380 

ATAACTTTTG AAATGAAACT CACCCTACTT TAGGGCAAAC AAGTAGCCAC AGAGAGCAGG 7440 

ATCTAAACAA GGTCTGGTGT CCCATTTGGC TGTGTCCCTT CAATTTTCTG TTCATTTAGC 7500 

TCTGTCTGCA TCTAAAGGGT GCTGGGCAAT AAGTTTTGAT CTTCAGGGCA AAACTCAATC 7560 

TTCAGTTACC ATGGTATCAG GTACCAATTC CTAGTGATTT GTGCTATGGC TTAGGATTTG 7620 

ATTTCTCTCC TACATTAGGT AATATCTTTC AATGGCTAGA ACTTGGGCAT TGCAGTACAC 7680 

TCAAGTTAAC AGTTCTGTGA CCTAAGGAAG TCACATAACC TCTCTGAATT CTCTACTGTT 7740 

TCATTCACAA AATGGAGAAA ATCATGGCTC TTTCTTAATG TGCGAATTCA TAGAAAGGTG 7800 
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ATGACACCAG ATTTGGCAGA AGGAAGGAAA GGAAGGAAGG AAGAAAGAAA GAAAGAAAGA 7860 

AAGAAAGAAA GAAAGAAAGA AAGAAAGAAA GGAAGGAAGG GAGAGAGAGA GAAGGGAAGG 7520 

GAAAGGGAAA GGGAAAGGAA AGAAAAGAAA GGAAGGAAGA AAAGGAAGGA AGGAAGGAAA 79B0 

GAAGGAAGGA AGGAAAAGAA AGAGAAGAAA GCATTCAGCA TATGAACTAA TGTTTCCTGG 8040 

TGACTTTTTA TATCATATCC TTGTTCTAGG AAGTGGCCCT AGCCATATCT TTTGGGTTAT 8100 

TTTGAGGTAG AGGATAATCA ACATAGTGTA GAACATTAAA TCTGGGTTTT GTTTCTAGAA 8160 

GAGGCTAGAA TGGCATGGCT GTCCCACTTG CTCCTCTTTC AGGCAGTATG GCAGCCACCA 8220 

TTCTCTCTGT AAGATCTAGG AGGCTGACAC TCAGGTTGGA GACAGGTCAG AATCCTGAAA 8280 

TCACTTAGCA AGTTCAGCTG ATTCAACAAG GGATATTTAC AGAGAATTAA CAGCTATTCC 8340 

AGCTTCCAAA AAGTGTACAT TACCTACTCT GTATTTTCAG AACCCCAGGT TTGCTGTGAT 8400 

AATTTGGTAG AAGCCTTTTC CTGTAATTTT CTTTATTTAA AAGATATTTT CATTTTCCAC 8460 

CCTCAAGAAG AGGTTGAAAC TTGTCCCTTG AAGTAGAAGA GGTGTTGTGT GTCCTGACCC 8520 

TGAGGAAGTT GGCCTTGTTG AGGTCTTCTG TAAATTCTTG A A TT C TCTGT ATAATTTCAA 8580 

TGAATAGTCA TGTTTGATAC CTTGGTATAA AGGATGGGAT AAGATCTTTC AAGGCTTAGG 8640 

CTGATGGAAA CGCTGCTGAA AGACTAGAGA TTGCTCTTTC CTTTGGCATC TGTCTTGGGT 8700 

AGTAATATTG TTCTCTGTGA AGGCCCACTT ATTCTGTCTT GAAAATTCTT CTTACCTCCA 8760 

GAGTGATAGG CCACAGGGAG TACTGTTTCT ATGTTTGCAG TTGAAAGATG ACAATTTCAT 8820 

ATGGTCCAAA CTTGGCTTTA TTTCTTGGTG AGATATTATT CTGTTACTTC AATGACCTGT 8880 

CTCCATTATT TATCTTGAGG CTCACCTCTT CCCTTTTGTT GACTGTTGTG CAATTTGTGG 8940 

AAGGCCCTGG GTAGTCAGCC TTTATACTCT GTCTGTACAG GAAATAAAGT GCATGTCACC 9000 

ATGCCAAAGT CAGGAGATGC CGGTGTGATT AGGGTCCACG GGATTTTGCT ACTGTTTTTA 9060 

TTTCTATCGA TGAATTGCCT TAGGCAGAAA CATTAAGGGA CACCAGAATG GTGATGAAAG 9120 

GCTTTTTATA ACAGAAGCTA AATGCAGTCC TTCATACTTC ATGGAATGCC CCTGTCCTAA 9180 

AGTACCATTA ACCGATAGTG GAGTCAGAAC ATAAATGGCT CCCCAAAGGT ATCACCAAGA 9240 

ACTTTTGGCA AACAGATGCA AGAGGATTAT GAAGAATCGC AGCTTGGTCT GGTAATCTTC 9300 

CTGTTGCAAA GAGAAGAGCT TTAGAAGACC CCCCTTGAGT CCCTGGCTGG CTTAACATAG 9360 

CATGAACCCT CATGTGTTGG CCAACATTAA GGCTTTTTCT ATAAAAGTCT CCTCCTTCAT 9420 

CAGTATACGC TCGAGTATGA AAAGCATCCT TTTAAACCTT GACTCTGTGT GGTCCAGAAA 9480 

CAGCAGCATC CCTTGCTTAA GAGCTTAATG GAGATGCAGG AGTGCAGGCC TCTTCCCAGA 954 0 
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CCGGCTGATG TGCAGGTCAA AGTCTAAGCA CTGCTGGATC AACACAGAAG TTATTCCGAA 9600 

TGAGGATGAG ATGGATACGA GAGAACAGGA AGTAGGAAGG GATTTCTTTA TCGTGAATTG 9660 

CTACAGCAGC CTAATGTCAC CCCATACCCT TCTGAAGAAC TATGTCCCTG TGGATGCCTT 9720 

TGTCTCTAGA GTTCTGAGCA AAATGGTAGG GTGTGCTTTG CAAAATGTCA TCATTGATGT • 9780 

TGAATTTCAA AGTCTTTAAT TAAGGGGCTG AAATCTGTAT ATTGAGATTT GTAAATCATC 984 0 

TAAATTGTAG AGTAATGTTT GCACAGGCTG CTTAAGGGAT TGACATTAAA GCTCGTTTTC 9900 

TTAGTTAAGA AATACAGTCA TTTCCTCAAC TCCTCAGTCA TTAGCTCTCT ACTAAGTACA 996 0 

GTGCTGACTT TTTTAAAATT AAAGTCTGTG AATTCCAAAG AAGTGTTTCA CTATTTCCTC 10020 

CATTATTATA GCTACCTAGA AGCTATGTTC ATATATTGGA TTAAAAACGT AGCAATTACA 1008 0 

AAGTTAATGT GGCCATATAG AAAAGGGAAA AGAAACTCCG CTTTCACTTT AATATATATA 1014 0 

TGTGTGTGTG TATATCATAT ATATACATGT TGTGTGTGTA TATATATATA TATATATATA 10200 

TATATATATA TATATATATA TATATATATA TGTTGTGTTA AGCAGTAAAC TCAGGCCATG 10260 

GACAGAGGGG CAGACATTGT ATCTCTAGGC CTGACATTTT TAATTTCTGG TTGCAGGTTT 1032 0 

TTATGTAGTT TAACTTAAAC CATGCACTGA AGTTTTAAAT GCTCGTAAGG AATTAAGTTA 1038 0 

CCATTGGCTC TCTTACCAAA TGCGTTTCTT TTTTCTCTCC ACCCTGATCA AACTAGAAGC 1044*0 

CGTGGAGGAA CTTCCAGAGA TGAGTGGGAA AACGGCCCGG CGCTTCTTCT TCAATTTAAG 10500 

TTCTGTCCCC AGTGACGAGT TTCTCACATC TGCAGAACTC CAGATCTTCC GGGAACAGAT 10560 

ACAGGAAGCT TTGGGAAACA GTAGTTTCCA GCACCGAATT AATATTTATG AAATTATAAA 10620 

GCCTGCAGCA GCCAACTTGA AATTTCCTGT GACCAGACTA TTGGACACCA GGTTAGTGAA 10680 

TCAGAACACA AGTCAGTGGG AGAGCTTCGA CGTCACCCCA GCTGTGATGC GGTGGACCAC 10740 

ACAGGGACAC ACCAACCATG GGTTTGTGGT GGAAGTGGCC CATTTAGAGG AGAACCCAGG 108 00 

TGTCTCCAAG AGACATGTGA GOATTAGCAG GTCTTTGCAC CAAGATGAAC ACAGCTGGTC 10860 

ACAGATAAGG CCATTGCTAG TGACTTTTGG ACATGATGGA AAAGGACATC CGCTCCACAA 10920 

ACGAGAAAAG CGTCAAGCCA AACACAAACA GCGGAAGCGC CTCAAGTCCA GCTGCAAGAG 10980 

ACACCCTTTG TATGTGGACT TCAGTGATGT GGGGTGGAAT GACTGGATCG TGGCACCTCC 11040 

GGGCTATCAT GCCTTTTACT GCCATGGGGA GTGTCCTTTT CCCCTTGCTG ACCACCTOAA 11100 

CTCCACTAAC CATGCCATAG TGCAGACTCT GGTGAACTCT GTGAATTCCA AAATCCCTAA 11160 

GGCATGCTGT GTCCCCACAG AGCTCAGCGC AATCTCCATG TTGTACCTAG ATGAAAATGA 11220 

AAAGGTTGTG CTAAAAAATT ATCAGGACAT GGTTGTGGAG GGCTGCGGGT GTCGTTAGCA 11280 
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CAGCAAGAAT AAATAAATAA ATATATATAT TTTAGAAACA GAAAAAACCC TACTCCCCCT 1134 0 

QCCTCCCCCC CAAAAAAACC AGCTGACACT TTAATATTTC CAATGAAGAC TTTATTTATG 11400 

GAATGGAATG AAAAAAACAC AGCTATTTTG AAAATATATT TATATCGTAC GAAAAGAAGT 11460 

TGGGAAAACA AATATTTTAA TCAGAGAATT ATTCCTTAAA GATTTAAAAT GTATTTAGTT 11520 

GTACATTTTA TATGGGTTCA ACTCCAGCAC ATGAAGTATA AGGTCAGAGT TATTTTGTAT 11580 

TTATTTACTA TAATAACCAC TTTTTAGGGA AAAAAGATAG TTAATTGTAT TTATATGTAA 11640 

TCAGAAGAAA TATCGGGTTT GTATATAAAT TTTCCAAAAA AGGAAATTTG TAGTTTGTTT 11700 

TTCAGTTGTG TGTATTTAAG ATGCAAAGTC TACATGGAAG GTGCTGAGCA AAGTGCTTGC 11760 

ACCACTTGCT GTCTGTTTCT TGCAGCACTA CTGTTAAAGT TCACAAGTTC AAGTCCAAAA 11820 

AAAAAAAAAA AGGATAATCT ACTTTGCTGA CTTTCAAGAT TATATTCTTC AATTCTCAGG 11880 

AATGTTGCAG AGTGGTTGTC CAATCCGTGA GAACTTTCAT TCTTATTAGG GGGATATTTG 11940 

GATAAGAACC AGACATTACT GATCTGATAG AAAACGTCTC GCCACCCTCC CTGCAGCAAG 12000 

AACAAAGCAG GACCAGTGGG AATAATTACC AAAACTGTGA CTATGTCAGG AAAGTGAGTG 12060 

AATGGCTCTT GTTCTTTCTT AAGCCTATAA TCCTTCCAGG GGGCTGATCT GGCCAAAGTA 12120 

CTAAATAAAA TATAATATTT CTTCTTTATT AACATTGTAG TCATATATGT GTACAATTGA 12180 

TTATCTTGTG GGCCCTCATA AAGAAGCAGA AATTGGCTTG TATTTTGTGT TTACCCTATC 12240 

AGCAATCTCT CTATTCTCCA AAGCACCCAA TTTTCTACAT TTGCCTGACA CGCAGCAAAA 12300 

TTGAGCATAT GTTTCCTGCC TGCACCCTGT CTCTGACCTG TCAGCTTGCT TTTCTTTCCA 12360 

GGATATGTGT TTGAACATAT TTCTCCAAAT GTTAAACCCA TTTCAGATAA TAAATATCAA 12420 

AATTCTGGCA TTTTCATCCC TATAAAAACC CTAAACCCCG TGAGAGCAAA TGGTTTGTTT 12480 

GTGTTTGCAG TGTCTACCTG TGTTTGCATT TTCATTTCTT GGGTGAATGA TGACAAGGTT 12540 

GGGGTGGGGA CATGACTTAA ATGGTTGGAG AATTCTAAGC AAACCCCAGT TGGACCAAAG 12600 

GACTTACCAA TGAGTTAGTA GTTTTCATAA GGGGGCGGGG GGAGTGAGAG AAAGCCAATG 12660 

CCTAAATCAA AGCAAAGTTT GCAGAACCCA AGGTAAAGTT CCAGAGATGA TATATCATAC 12720 

AACAGAGGCC ATAGTGTAAA AAAATTAAAG AATGTCTGAT CAGCGTCTCA GCACATCTAC 12780 

CAATTGGCCA GATGCTCAAA CAOAGTOAAG TCAGATGAGG TTCTGGAAAG TGAGTCCTCT 12840 

ATGATGGCAG AGCTTTGGTG CTCAGGTTGG AAGCAAAACC TAGGGAGGGA GGGCTTTGTG 12900 

GCTGTTTGCA GATTGGGGAA TCCAGTGCTA GTTCCTGGCA GGGTTTCAGG TCAGTTTCCG 12960 

GAGTGTGTGT CCTGTAGCCC TCCGTCATGG TTGAAGCCCA GGTCTCACCT CCTCTCCTGA 13020 
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CCCGTGCCTT AGAACTGACT TGGAAAGCGG TGTGCTTACA QCAAGACAGA CTCTTATAAT 130B0 
TAAATTCTTC CCAAGGACCT CCGTGCAATG ACCCCAAGCA CACTTACCTT CGGAAACCTT 13140 
AAGGTTCTGA AGATCTTGTT TTAAATGACT ACCCTGGTTA GCTTTTGATG TGTTCCTTAT 13200 
CCCTTTAGTT GTTGCACAGG TAGAAACGAT TAGACCCAAC TATGGGTAGC CTTGTCCTCC 132 SO 
TGGTCCTTCA GTCATTCTCT AATGTCTCTT GCTTGCCATG GGCACTGTAA CAAACTGCAA 13320 
TCTTAACATC TTATAAAATG AATGAACCAC ATATTTACAT CTCCAAGTCC TCCAGATGGG 13380 

AGTGCGATCA TTCCATAAGG ATCCCACCTT CTGGCAGGTC TATCCAGTAC ATATTTTATG 13440 

CTTCATTGGT CTTGATTTTC TTGGCTAAAA TTACTTGTAG CACAGCAGGC CCCATGTGAC 13500 

ATATAGGTAT ATACATACAT GTATGTGCAT ATAGTGTGTA CATGTTCTAA TTTATACATA 13560 

GCTATGTGAA GATTATGTTA CATATGTAGA TGGTCGCACT TCTGATTTCC ATTTAGGTTC 13620 

AGAGAGAGAC GTCACAGTAA ATGGAGCTAT GTCATTGGTA TATCCCCGAG TGGTTCAGGT 136 BO 

GTTCT CTCTA TTTTTTTAAG ATGGAGAACA CTCATCTGTA CTATCGAAAA CTGAGCCAAA 13740 

TCACTTAGCA AATTTCTAGT CACTGCCTTG CTGTTAAGAT ACTGATTCAC TGGGTGCTGA 13800 

CATGCTGAGC CCTGCCTACT TTTGCATGAA GGACAAGGAA GAGAGCTTGC AGTTAAGAAT . 13860 

GGTATATGTG GGGCTAGGGG GCGGCGTATA GACTGGCATA TATGTGAAGG AAGGTCACAA 13920 

ACAGCCTGCA CTAATTTCCC TTTTCTGGTT TTATGTCTTG GCAGGGGAAA GGACAGGTAG 13980 

GGTGGGGTTG AGGGGGAGGG CACACACATC TACTTGGATA AATTGCATCT CCTCITTCCT 14040 

TCACCCCGCC ACCATATCTT AAAGCCTTAT GACATCCTCT AGGGCAGAAT TTTCTCACCA 14100 

GCTCCCCGCC CTACCAACTT CAAAGTGAAC TTCTAACTAA CTTOAGGGGC CAAAGTTCTA 14160 

AATAAAACTT GTTAGAGTTT AGCGOGCACC TCAGTCATCA GGAATGCCTC CAGGAAAGCA 14220 

AAAAGCTTGA TGTGTGTACA GCCACGTGGT GGAGTCCTGC CACCCTATGA TTCCTGTCCC 142 B0 

AGTGGTCGTG TGGGGCCTGA GATCCTGAAT TTCTAATGAG CTCCCAGTAC GCCCTCACTC 14340 

ACTGTGCCAG AGGACTGCAG TTTGAGTAGC AAGGTTGTGT GACTGTCTTC GATCATGGCT 14400 

ACAGAAGCTG GCTCAAGTAC AGCCCTTCGT GTGTAAAAGC CATGTGTAAA TGAGAAGAAA 14460 

CAGAAGGCAA AGCTGCGTTG CATGGCATCT GAATCAGTGC CCTGCAGTTT TGTTTTTTGT 14S20 

TTTTTrrTTT TCAAAGACAT TCTTTTTCCC AACAAGATGA GTGGCAATCT TATGTTCTAG 14S80 

CCACTCTTAG ACATGAAAAC ACTGGGTTGC TTATCTTGTA AAATCTGCTC TGCTTGCTTG 14640 

CTTGGGCACG CTGCAGTCAG TTTAGTCAAA TGCGTGTCAG TACATCTATA TGTATGAGGG 14700 

AGCAGGTGCA AGTCCTTAGA AATGTACTTT AAAAAACTTG AACACTTAAG TCAGTGTGCT 14760 
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GAGCTGCTCC TGTGTGATGT TAGGCCAAGC ACCTGAGTTA AAGGGATCTC TTTGAAGGCA 14820 

GAGGGTAGAT GTCGTATGGT TGAAGCATTT GTTTATACTA AAATGATGCT TGACTTTTTT 14880 

TCTAAGTTAT AAGACAGTAC ACTGTATAAG TTCATTGAAC CTAGAGGGTG GCATAGGACT 14940 

CCAAATCTGG TATGGGAGGT TTGTTCTAAT GGAAGTTCGA ATCTTTTTTG CAGTTGGCTT 15000 

GGAATAAAGT GCTTATGTGA ATGGGCTTAA GCTAGGGAAA AAAATGGGTT TCCCTCTGCA 15060 



AAGAGGGTCA GCACAGAAAT AACTTCCTGG CTTTGCTTGC ATGAATGCCA CTTGTTAGCA 

GATGCCCTGT GGGGATCCGA ATTC 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 9299 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS": single 

(D) TOPOLOGY : linear 

<ii) MOLECULE TYPE: DNA (genomic) 



15120 
15144 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GAATTCGCTA GGTAGACCAG GCTGGCCCAG AACACCTAGA GATCATCTGG CTGCCTCTGT 60 

CTCTTGAGTT CTGGGGCTAA AGCATGCACC ACTCTACCTG GCTAGTTTGT ATCCATCTAA 120 

ATTGGGGAAG AAAGAAGTAC AGCTGTCCCC AGAGATAACA GCTGGGTTTT CCCATCAAAC 180 

ACCTAGAAAT CCATTTTAGA TTCTAAATAG GGTTTGTCAG GTAGCTTAAT TAGAACTTTC 240 

AGACTGGGTT TCACAGACTG GTTGGGCCAA AGGTCACTTT ATTGTCTGGG TTTCAGCAAA 300 

ATGAGACAAT AGCTGTTATT CAAACAACAT TTGGGTAAGG AAOAAAAATG AACAAACACC 360 

ACTCTCCCTC CCCCCGCTCC GTGCCTCCAA ATCCATTAAA GGCAAAGCTG CACCCCTAAG 420 

GACAACGAAT CGCTGCTGTT TGTGAGTTTA AATATTAAGG AACACATTGT GTTAATGATT 480 

GGAGCAGCAG TGATTGATGT AGTGGCATTG GTGAGCACTG AATCCGTCCT TCAACCTGCT 540 

ATGGGAGCAC AGAGCCTGAT GCCCCAGGAG TAATGTAATA GAGTAATGTA ATGTAATGGA 600 

GTTTTAATTT TGTGTTGTTG TTTTAAATAA TTAATTGTAA TTTTGGCTGT GTTAGAAGCT 660 

GTGGGTACGT TTCTCAGTCA TCTTTTCGGT CTGGTGTTAT TGCCATACCT TGATTAATCG 720 

GAGATTAAAA GAGAAGGTGT ACTTAGAAAC GATTTCAAAT GAAAGAAGGT ATGTTTCCAA 780 

TGTGACTTCA CTAAAGTGAC AGTGACGCAG GGAATCAATC GTCTTCTAAT AGAAAGGGCT 840 

CATGGAGACC TGAGCTGAAT CTTTCTGTTC TGGATGAGAG AGGTGGTACC CATTGGAATG 900 
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AAAGGACTTA GTCAGGGGCA ATACAGTGTG CTCCAAGGCT GGGGATGGTC AGGATGTTGT 960 

GCTCAGCCTC TAACACTCCT TCCAACCTGA CATTCCTTCT CACCCTTTGT CTCTGGCCAG 1020 

TAGAATACAG GAACTCGTTC CTGTTTTTTT TTTTTTAAAT TCTGAAGGTG TGTAAGTACA 1080 

AAGGTCAGAT GAGCGGCCCT AGGTCAAGAC TGCTTTGTGG TGACAAGGGA GTATAACACC 114 0 

CACCCCAGAA ACCAAGAACC GGAAATTGCT ATCTTCCAGC CCTTTGAGAG CTACCTGAAG 1200 

CTCTGGGCTG CTGGCCTCAC CCCTTCCCTG CAGCTTTCCC TTTAGCAGAG GCTGTGATTT 1260 

CCTTCAGCGC TTGGGCAAAT ACTCTTAGCC TGGCTCACCT TCCCCATCCT CGTTTGTAAA 1320 

AACAAAGATG AAGCTGATAG TTCCTTCCCA GCTCCATCAG AGGCAGGGTG TGAAATTAGC 13 80 

TCCTGTTTGG GAAGGTTTAA AAGCCGGCCA CATTCCACCT CCCAGCTAGC ATGATTACCA 1440 

ACTCTTGTTT CTTACTGTTG TTATGAAAGA CTCAATTCCT CATCTCCCTT TCCCTTCTTT 1500 

TAAAAAGGGG CCAAAGGGCA CTTTGTTTTT TTCTCTACAT GGCCTAAAAG GCACTGTGTT 1560 

ACCTTCCTGG AAGGTCCCAA ACAAACAAAC AAACAAACAA AATAACCATC TGGCAGTTAA 1620 

GAAGGCTTCA GAGATATAAA TAGGATTTTC TAATTGTCTT ACAAGGCCTA GGCTGTTTGC 1680 

CTGCCAAGTG CCTGCAAACT ACCTCTGTGC ACTTGAAATG TTAGACCTGG GGGATCGATG 1740 

GAGGGCACCC AGTTTAAGGG GGGTTGGTGC AATTCTCAAA TGTCCACAAG AAACATCTCA 1800 

CAAAAACTTT TTTGGGGGGA AAGTCACCTC CTAATAGTTG AAGAGGTATC TCCTTCGGGC 1860 

ACACAGCCCT GCTCACAGCC TGTTTCAACG TTTGGGAATC CTTTAACAGT TTACGGAAGG 1920 

CCACCCTTTA AACCAATCCA ACAGCTCCCT TCTCCATAAC CTOATTTTAG AGGTGTTTCA 1980 

TTATCTCTAA TTACTCGGGG TAAATGGTGA TTACTCAGTG TTTTAATCAT CAGTTTGGGC 2040 

AGCAGTTATT CTAAACTCAG GGAAGCCCAG ACTCCCATGG GTATTTTTGG AAGGTACAGA 2100 

GACTAGTTGG TGCATGCTTT CTAGTACCTC TTGCATGTGG TCCCCAGGTG AGCCCCGGCT 2160 

GCTTCCCGAG CTGGAGGCAT CGGTCCCAGC CAAGGTGGCA ACTGAGGGCT GGGGAGCTGT 2220 

GCAATCTTCC GOACCCGGCC TTGCCAGGCG AGGCGAGGCC CCGTGGCTGG ATGGGAGGAT 2280 

GTGGGCGGGG CTCCCCATCC CAGAAGGGGA GGCGATTAAG GGAGGAGGGA AGAAGGGAGG 2340 

GGCCGCTGGG GGGAAAOACT GGGGAGGAAG GGAAGAAAGA GAGGGAGGGA AAAGAGAAGG 2400 

AAGGAGTAGA TGTGAGAGGG TGGTGCTGAG GGTGGGAAGG CAAGAGCGCG AGGCCTGGCC 2460 

CGGAAGCTAG GTGAGTTCGG CATCCGAGCT GAGAGACCCC AGCCTAAGAC GCCTGCGCTG 2520 

CAACCCAGCC TGAGTATCTG GTCTCCGTCC CTGATGGGAT TCTCGTCTAA ACCGTCTTGG 2580 

AGCCTGCAGC GATCCAGTCT CTGGCCCTCG ACCAGGTTCA TTGCAGCTTT CTAGAGGTCC 2640 
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CCAOAAGCAG CTGCTGGCGA GCCCGCTTCT GCAGGAACCA ATGGTGAGCA GGGCAACCTG 2700 

GAGAGGGGCG CTATTCTGAG GATTCGAGGT GCACCCGTAG TAGAAGCTGG GGATGGGGCT 2760 

CAGGCTGTAA CCGAGGCAAA AGTTGGCCTA TTCCTCCTTC CTTCTCCAAC AGTGTTGGAG 2820 

GTGOGATGAT GGAGGCTAAA AGGCACCTCC ATATATGTTA CTGCGTCTAT CAACCTACTT 2880 

TAGGGAGGTG CGGGCCAGGA GAGGCGGGAA GGAGAGAAGG CCTTGGAAGA GAGGTCATTG 2940 

GGAAGAACTG TGGGGTTTGG TGGGTTTGCT TCCACTTAGA CTATAAGAGT GGGAGAGGAG 3000 

GGAGTCAACT CTAAGTTTCA ACACCAGTGG GGGACTGAGG ACTGCTTCAT TAGGAGAGAG 3060 

AACCTAGCCA GAGCTAGCTT TGCAAAAGAG GCTGTAGTCC TGCTTTGCTC TAAAGCGCGA 3120 
CCCGGGATAG AGAGGCTTCC TTGAGCGGGG TGTCACCTAA TCTTGTCCCC AACGCACCCC * 3180 

CTCCCAGCCC CTGAGAGCTA GCGAACTGTA GGTACACAAC TCGCTCCCAT CTCCAGGAGC 3240 

TATTTTCTTA GACATGGGCA CCCATGATTC TGCCTTCTGG TACTCTCCCC TCCCTGGGAA 3300 

AGGGGTGTAA GGTTCCGACG GAACCGTGGC CAGGATGCCG AAAGGCTACC TGTGCGGGTC 3360 

TTCTGCCATG CTGTGTCTGT GCGGACATGC CAGCAGGGCT AATGAGGAGC TTGCGATACT 3420 

CCAAAGGGTT CGGGAATTGC GGGGTCCTTA CACGCAGTGG AGTTGGGCCC CTTTTACTCA 3480 

GAAGGTTTCC GCCACGGCTT TGGTTGATAG TTTTTTTAGT ATCCTGGTTT ATGAACTGAA 354 0" 

GGTTTTGTGA GATGTTGAAT CACTAGCAGG GTCATATTTG GCAAACCGAG GCTACTATTA 3600 

AATTTTGGTT TTAGAAGAAG ATTCTGGGGA GAAAGTGAAG GGTAACTGCC TCCAGGAGCT 3660 

GTATCAACCC CATTAAGAAA AAAAAAAATA CCAGGAGATG AAAATTTACT TTGATCTGTA 3720 

TTTTTTAATT AAAAAAAATC AGGGAAGAAA GGAGTGATTA GAAAGGGATC CTGAGCGTCG 378 0 

GCGGTTCCAC GGTGCCCTCG CTCCGCGTGC GCCAGTCGCT AGCATATCGC CATCTCTTTC 3840 

CCCCTTAAAA GCAAATAAAC AAATCAACAA TAAGCCCTTT GCCCTTTCCA GCGCTTTCCC 3900 

AGTTATTCCC AGCGGCGACG CGTGTCGGGG AATAGAGAAA TCGTCTCAGA AAGCTGCGCT 3960 

GATGGTGGTG AGAGCGGACT GTCGCTCAGG GGCGCCCGCG GTCTCTGCAC CCAGGGCAGC 402 0 

AGTGTGGGAT GGCGCTGGGC AGCCACCGCC GCCAGGAAGG ACGTGACTCT CCATCCTTTA 4080 

CACTTCTTTC TCAAAGGTTT CCCGAAA3TG CCCCCCGCCT CGAAAACTGG GGCCGGTGCG 4140 

GGGGGGGGGA GAGGTTAGGT TGAAAACCAG CTGGACACGT CGAGTTCCTA AGTGAGGCAA 4200 

AGAGGCGGGG TGGAGCGGGC TCTGGAGCGG GGGAGTCCTG GGACTCGGTC CTCGGATGGA 4260 

CCCCGTGCAA AGACCTGTTG GAACAAGAGT TGCGCTTCCG AGGTTAGAAC AGGCCAGGCA 4320 

TCTTAGGATA GTCAGGTCAC CCCCCCCCCC AACCCCACCC GAGTTGTGTT GGTGAATTTC 4380 
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TTGGAGGAAT CTTAGCCGCG ATTCTGTAGC TGGTGCAAAA GGAGGAAAGG GGTGGGGGAA 444 0 

GGAAGTGGCT GTGCGGGGGT GGCGGTGGGG GTGGAGGTGG TTTAAAAAGT AAGCCAAGCC 4500 

AGAGGGAGAG GTCGAGTGCA GGCCGAAAGC TGTTCTCGGG TTTGTAGACG CTTGGGATCG 4560 

CGCTTGGGGT CTCCTTTCGT GCCGGGTAGG AGTTGTAAAG CCTTTGCAAC TCTGAGATCG 4620 

TAAAAAAAAT GTGATGCGCT CTTTCTTTGG CGACGCCTGT TTTGGAATCT GTCCGGAGTT 4680 

AGAAGCTCAG ACGTCCACCC CCCACCCCCC GCCCACCCCC TCTGCCTTGA ATGGCACCGC 4740 

CGACCGGTTT CTGAAGGATC TGCTTGGCTG GAGCGGACGC TGAGGTTGGC AGACACGGTG 4800 

TGGGGACTCT GGCGGGGCTA CTAGACAGTA CTTCAGAAGC CGCTCCTTCT AACTTTCCCA 4860 

CACCGCTCAA ACCCCGACAC CCCCGCGGCG GACTGAGTTG GCGACGGGGT CAGAGTCTTC 4920. 

TGGCTGAAAG TTAGATCCGC TAGGGGTCGG CTGCCTGTCG CTAGAAGCAT TATTTGGCCT 4980 

CTCGGAGACC CGTGTGGAGG AAGTGCTGGA GTGTGCGAGT GTGTTTGCGT GTGTGTGTGT 5040 

GTGTGTGTGT GTGTGTGTGT GTGTGTGTGT GTGCGCGCGC CCTTGGAGGG TCCCTATGCG 5100 

CTTTCCTTTT CATGGAACGC TGTCGTGAGG CTTTGGTAAA CTGTCTTTTC GGTTCCTCTC 5160 

TCGGCTGCAC TTAAGCTTTG TCGGCGCTGT AAAGAGACGC GTCTTCAAGT GCACCCTGAT 5220 

CCTCAGGCTT CAGATAACCC GTCCCCGAAC CTGGCCAGAT GCATTGCACT GCGCGCCGCA 5280 

GGTAGAGACG TGCCCCACGT CCCCTGCGTG CAGCGACTAC GACCGAGAGC CGCGCCAGTG 5340 

TGGTGTCCCG CCGAGAGTTC CTCAGAGCAG GCGGGGACAA CTCCCAGACG GCTGGGGCTC 5400 

CAGCTGCGGG CGCGGAGGTT GGCCTCGCTC GCAGGGGCTG GACCCAGCCG GGGTGGGAGG 5460 

ATGGAGGAGG GGCGGGCGGG CTCTTCGGTG AGTGGGGCGG GGCCTCTGGG TCCACGTGAC 5520 

TCCTAGGGGC TGGAAGAAAA ACAGAGCCTG TCTGCTCCAG AGTCTCATTA TATCAAATAT 5580 

CATTTTAGGA GCCATTCCGT AGTGCCATTC GGAGCGACGC ACTGCCGCAG CTTCTCTGAG 5640 

CCTTTCCAGC AAGTTTGTTC AAGATTGGCT CCCAAGAATC ATGGACTGTT ATTATGCCTT 5700 

GTTTTCTGTC AGTGAGTAGA CACCTCTTCT TTCCCTTCTT GGGATTTCAC TCTGTCCTCC 5760 

CATCCCTGAC CACTGTCTGT CCCTCCCGTC GGACTTCCAT TTCAGTGCCC CGCGCCCTAC 5820 

TCTCAGGCAG CGCTATGGTT CTCTTTCTGG TCCCTGCAAG GCCAGACACT CGAAATGTAC 5880 

GGGCTCCTTT TAAAGCGCTC CCACTGTTTT CTCTGATCCG CTGCGTTGCA AGAAAGAGGG 5940 

AGCGCGAGGG ACCAAATAGA TGAAAGGTCC TCAGGTTGGG GCTGTCCCTT GAAGGGCTAA 6000 

CCACTCCCTT ACCAGTCCCG ATATATCCAC TAGCCTGGGA AGGCCAGTTC CTTGCCTCAT 6060 

AAAAAAAAAA AAAAAAACAA AAAACAAACA GTCGTTTGGG AACAAGACTC TTTAGTGAGC 6120 
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ATTTTCAACG CAGCGACCAC AATGAAATAA ATCACAAAGT CACTGGGGCA GCCCCTTGAC 6180 

TCCTTTTCCC AGTCACTGGA CCTTGCTGCC CGGTCCAAGC CCTGCCGGCA CAGCTCTGTT 6240 

CTCCCCTCCT CCTGTTCTTA ACCAGCTGGA AGTTGTGGAA ATTGGGCTGG AGGGCGGAGG 6300 

AAGGGCGGGG GTGGGGGGGT GGAGAAGGTG GGGGGGGGGG AGGCTGAAGG TCCGAAGTGA 6360 

AGAGCGATGG CATTTTAATT CTCCCTCCGC CTCCCCCCTT TACCTCCTCA ATGTTAACTG 6420 

TTTATCCTTG AAGAAGCCAC GCTGAGATCA TGGCTCAGAT AGCCGTTGGG ACAGGATGGA 648 0 

GGCTATCTTA TTTGGGGTTA TTTGAGTGTA AACAAGTTAG ACCAAGTAAT TACAGGGCGA 654 0 

TTCTTACTTT CGGGCCGTGC ATGGCTGCAG CTGGTGTGTG TGTGTGTAGG GTGTGAGGGA 6600 

GAAAACACAA ACTTGATCTT TCGGACCTGT TTTACATCTT GACCGTCGGT TGCTACCCCT 666 0 

ATATGCATAT GCAGAGACAT CTCTATTTCT CGCTATTGAT CGGTGTTTAT TTATTCTTTA 6720 

ACCTTCCACC CCAACCCCCT CCCCAGAGAC ACCATGATTC CTGGTAACCG AATGCTGATG 6780 

GTCGTTTTAT TATGCCAAGT CCTGCTAGGA GGCGCGAGCC ATGCTAGTTT GATACCTGAG 6840 

ACCGGGAAGA AAAAAGTCGC CGAGATTCAG GGCCACGCGG GAGGACGCCG CTCAGGGCAG 6900 

AGCCATGAGC TCCTGCGGGA CTTCGAGGCG ACACTTCTAC AGATGTTTGG GCTGCGCCGC 6960 

CGTCCGCAGC CTAGCAAGAG CGCCGTCATT CCGGATTACA TGAGGGATCT TTACCGGCTC 7026 

CAGTCTGGGG AGGAGGAGGA GGAAGAGCAG AGCCAGGGAA CCGGGCTTGA GTACCCGGAG 7080 

CGTCCCGCCA GCCGAGCCAA CACTGTGAGG AGTTTCCATC ACGAAGGTCA GTTTCTGCTC 714 0 

TTAGTCCTGG CGGTGTAGGG TGGGGTAGAG CACCGGGGCA GAGGGTGGGG GGTGGGCAGC 7200 

TGGCAGGGCA AGCTGAAGGG GTTGTGGAAG CCCCCGGGGA AGAAGAGTTC ATGTTACATC 7260 

AAAGCTCCGA GTCCTGGAGA CTGTGGAACA GGGCCTCTTA CCTTCAACTT TCCAGAGCTG 7320 

CCTCTGAGGG TACTTTCTGG AGACCAAGTA GTGGTGGTGA TGGGGGAGGG GGTTACTTTG 7380 

GGAGAAGCGG ACTGACACCA CTCAGACTTC TGCTACCTCC CAGTGGGTGT TCTTTAGCTA 7440 

TACCAAAGTC AGGGATTCTG CCCGTTTTGT TCCAAAGCAC CTACTGAATT TAATATTACA 7500 

TCTGTGTGTT TGTCAGGTTT ATCAATAGGG GCCTTGTAAT ACGATCTGAA TGTTTCCTAG 7560 

CGGATGTTTC TTTTCCAAAG TAAATCTGAG TTATTAATCC TCCAGCATCA TTACTGTGTT 7620 

GGAATTTATT TTCCCTTCTG TAACATGATC AACAAGGCGT GCTCTGTGTT TCTAGGATCG 7680 

CTGGGGAAAT GTTTGGTAAC ATACTCAAAA GTGGAGAGGG AGAGAGGGTG GCCCCTCTTT 7740 

TTCTTTACAA CCACTTGTAA AGAAAACTGT ACACAAAGCC AAGAGGGGGC TTTAAAAGGG 7800 

GAGTCCAAGG GTGGTGGAGT AAAAGAGTTG ACACATGGAA ATTATTAGGC ATATAAAGGA 7860 
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GGTTGGGAGA TACTTTCTGT CTTTGGTGTT TGACAAATGT GAGCTAAGTT TTGCTGGTTT 7920 

GCTAGCTGCT CCACAACTCT GCTCCTTCAA ATTAAAAGGC ACAGTAATTT CCTCCCCTTA 7980 

GGTTTCTACT AT AT AAG CAG AATTCAACCA ATTCTGCTAT TTTTTGTTTT TGTTTCTTGT 8040 

TTTTGTTTTG TTTGGTTTTT TTTTTTTTTT TT T1T1TTTT GTCTCAGAAA AGCTCATGGG 8100* 

CCTTTTCTTT TCCCCTTTCA ACTGTGCCTA GAACATCTGG AGAACATCCC AGGGACCAGT 8160 

GAGAGCTCTG CTTTTCGTTT CCTCTTCAAC CTCAGCAGCA TCCCAGAAAA TGAGGTGATC 8220 

TCCTCGGCAG AGCTCCGGCT CTTTCGGGAG CAGGTGGACC AGGGCCCTGA CTGGGAACAG 82 BO 

GGCTTCCACC GTATAAACAT TTATQAGGTT ATGAAGCCCC CAGCAGAAAT GGTTCCTGGA 8340 

CACCTCATCA CACGACTACT GGACACCAGA CTAGTCCATC ACAATGTGAC ACGGTGGGAA 8400 

ACTTTCGATG TGAGCCCTGC AGTCCTTCGC TGGACCCGGG AAAAGCAACC CAATTATGGG 8460 

CTGGCCATTG AGGTGACTCA CCTCCACCAG ACACGGACCC ACCAGGGCCA GCATGTCAGA 8520 

ATCAGCCGAT CGTTACCTCA AGGGAGTGGA GATTGGGCCC AACTCCGCCC CCTCCTGGTC 8580 

ACTTTTGGCC ATGATGG CCG GGGCCATACC TTGACCCGCA GGAGGGCCAA ACGTAGTCCC 8640 

AAGCATCACC CACAGCGGTC CAGGAAGAAG AATAAGAACT GCCGTCGCCA TTCACTATAC 8700 

GTGGACTTCA GTGACGTGGG CTGGAATGAT TGGATTGTGG CCCCACCCGG CTACCAGGCC 8760 

TTCTACTGCC ATGGGGACTG TCCCTTTCCA CTGGCTGATC ACCTCAACTC AACCAACCAT 8820 

GCCATTGTGC AGACCCTAGT CAACTCTGTT AATTCTAGTA TCCCTAAGGC CTGTTGTGTC 8880 

CCCACTGAAC TGAGTGCCAT TTCCATGTTG TACCTGGATG AGTATGACAA GGTGGTGTTG 8940 

AAAAATTATC AGGAGATGGT GGTAGAGGGG TGTGGATGCC GCTGAGATCA GACAGTCCGG 9000 

AGGGCGGACA CACACACACA CACACACACA CACACACACA CACACACACA CACGTTCCCA 9060 

TTCAACCACC TACACATACC ACACAAACTG CTTCCCTATA GCTGGACTTT TATCTTAAAA 9120 

AAAAAAAAAA GAAAGAAAGA AAGAAAGAAA GAAAAAAAAT GAAAGACAGA AAAGAAAAAA 9180 

AAAACCCTAA ACAACTCACC TTGACCTTAT TTATGACTTT ACGTGCAAAT GTTTTGACCA 9240 

TATTGATCAT ATTTTGACAA ATATATTTAT AACTACATAT TAAAAGAAAA TAAAATGAG 9299 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 19 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE : CDNA 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 8; 
CGGATGCCOA ACTCACCTA 19 
(2) INFORMATION FOR SEQ ID NO: 9:, 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CTACAAACCC GAOAACAO 
(2) INFORMATION FOR SEQ ID NO:10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE : nucleic acid 
<C) STRAND EDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CCCGQCACGA AAGGAGAC 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS ; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GAAGGCAAGA GCGCGAGG 
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Claims 

1 . A system for identifying osteogenic agents comprising a recombinant host cell 
modified to contain an expression sequence comprising a promoter derived from a gene 
encoding a bone morphogenic protein operatively linked to a reporter gene encoding an 
assayable product 

2. The system of claim 1 wherein said bone morphogenic protein is selected from 
the group consisting of the BMP-2 and BMP-4 proteins. 

3. The system of claim 1 or 2 wherein said reporter gene comprises a gene 
encoding the production of an assayable product selected from the group consisting of 
firefly luciferase, chloramphenicol acetyl transferase, fi-galactosidase, green fluorescent 
protein, human growth hormone, alkaline phosphatase and P-glucuronidase. 

4. The system of claim 3 wherein said reporter gene comprises a gene encoding 
the production of firefly luciferase. 

5. A method for identifying an osteogenic compound comprising the steps of: 

culturing the cells of any of claim 1-4 under conditions which permit expression of 
said assayable product from said reporter gene; 

contacting said cells with at least one candidate compound suspected of possessing 
osteogenic activity; 

measuring the amount of assayable product produced in the presence of said 
candidate compound and comparing said amount to the amount of assayable product 
produced in the absence of said candidate compound; and 

identifying, as an osteogenic compound, a candidate compound that enhances the 
amount of said assayable product when present 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; cDNA 



(xi) SEQUENCE DESCRIPTION: SEQIDNO:12: 

CCCGGTCTCA GGTATCA 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
(A} LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
CAGGCCGAAA GCTGTTC 
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6. An isolated nucleic acid molecule comprising a nucleotide sequence encoding 
the promoter region of a gene encoding bone moiphogenetic protein selected from the 
group consisting of the BMP-2 and BMP-4 proteins. 

7. The nucleic acid molecule of claim 6 which corresponds to a nucleotide 
sequence selected from the group consisting of positions -2372 to +316 of the BMP-4 gene 
depicted in Figure 1C (SEQ ID NO:3), a portion thereof which encodes a biologically 
active promoter, the BMP-2 sequence depicted in Figure 1 1 , and a portion thereof which 
encodes a biologically active promoter. 

8. A recombinant expression vector comprising the nucleotide sequence of claim 

6 or 7. 

9. The recombinant expression vector of claim 8 wherein said nucleotide 
sequence is operatively linked to a reporter gene encoding an assayable product. 

10. The recombinant expression vector of claim 9 wherein said reporter gene 
comprises a gene encoding the production of an assayable product selected from the group 
consisting of firefly luciferase, chloramphenicol acetyl transferase, P-galactosidase, green 
fluorescent protein, human growth hormone, alkaline phosphatase or (J-glucuronidase. 
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1 GAATTCATTT aagctggatt cacttctagg tcccatgcgt ttacactcat 

51 TTCCACCACA AGAGGGCAGC CATCTCTAAA AAAACAACAG TCGAGTGCTC 

' 101 TTCAGAGAAA TTGGGCCAAA CTTGAGGAAA GTTCCTGGGA AAGGCTTTT^ 

151 AGCAGCACCT CTCTGGGCTA CAAAAAAGAA GCCAGCAGGC ACCACCAAGG 

201 TGGAGTAACT GTCCAGAGGC ATCCATTTTA CCTCAGAGAC TTGATTACTA 

251 AGGATATCCT AAACGGCCAA ACTCTCTCTT CTGGTGTTCC AGAGGCCCAA 

2 01 AGCTGCAAGG CATTGTTGAT GTCATCACCA AAGGTTTCAT TTTCATCTTT 

3 51 TCTTGGGGTT GGTCCAACAG CTGTCAGCTT TCTCTTCCTC ATTAAAGGCA 

4 01 ACTTTCTCAT TTAAATCTCA TATAGGTTCG GAGTTTCTTG CTTTGCTCCT 
4 51 TCCGCCTCCG CGATGACAGA AGCAATGGTT AACTTCTCAA TTAAACTTGA 
501 TAGGGAAGGA AATGGCTTCA GAGGCGATCA GCCCTTTTGA CTTACACACT 
551 .TACACGTCTG AGTGGAGTGT TTTATTGCCG CCTTGTTTGG TGTCTCATGA 
601 TTCAGAGTGA CAACTTCTGC AACACGTTTT AAAAAGGAAT ACAGT AG CTG 
651 ATCGCAAATT GCTGGATCTA TCCCTTCCTC TCCTTTAATT TCCCTTGTAG 
7 01 ACAGCCTTCC TTCAAAAATA CCTTATTTGA CCTCTACAGC TCTAGAAACA 
751 GCCAGGGCCT AATTTCCCTC TGTGGGTTGC TAATCCGATT TAGGTGAACG 
3 01 AACCTAGAGT TATTTTAGCT AAAAGACTGA AAAG CTAGC A CACGTGGGTA 
351 AAAAAATCAT TAAAGCCCCT GCTTCTGGTC TTTCTCGGTC TTTGCTTTGC 
901 AAACTGGAAA GATCTGGTTC ACAACGTAAC GTTATCACTC TGGTCTTCTA 
'9 51 CAGGAATGCT CAGCCCATAG TTTTGGGGGT CCTGTGGGTA GCCAGTGGTG 
1001 GTACTATAAG GCTCCTGAAT GTAGGGAGAA ATGGAAAGAT TCAAAAAAGA 
1051 ATCCTGGCTC AGCAGCTTGG GGACATTTCC AGCTGAGGAA GAAAACTGGC 
1101 TTGGCCACAG CCAGAGCCTT CTGCTGGAGA CCCAGTGGAG AGAGAGGACC 
1151 AGGCAGAAAA TTCAAAGGTC TCAAACCGGA ATTGTCTTGT TACCTGACTC 
1201 TGGAGTAGGT GGGTGTGGAA GGGAAGATAA ATATCACAAG TATCGAAGTG 

12 51 ATCGCTTCTA TAAAGAGAAT TTCTATTAAC TCTCATTGTC CCTCACATGG 

13 01 ACACACACAC ACACACACAC ACACACACAC ACACATCACT AGAAGGGATG 

13 51 TCACTTTACA AGTGTGTATC TATGTTCAGA AACCTGTACC CGTATTTTTA 

14 01 TAATTTACAT AAATAAATAC ATATAAAATA T ATG CAT CTT TTTATTAGAT 
14 51 TCATTTATTT GAATATAAAT GTATGAATAT TTATAAAATG TAATAATGCA 
1501 CTCAGATGTG TATCGGCTAT TTCTCGACAT TTTCTTCTCA CCATTCAAAA " 
1551 CAGAAGCGTT TGCTCACATT TTTGCCAAAA TGTCTAATAA CTTGTAAGTT 
1601 CTGTTCTTCT TTTTAATGTG CTCTTACCTA AAAACTTCAA ACTCAAGTTG. 
1651 ATATTGGCCC AATGAGGGAA CTCAGAGGCC AGTGGACTCT ♦ GG ATTTGCCC 
17 01 TAGTCTCCCG CAGCTGTGGG CGCGGAXCCA GGTCCCGGGG GTCGGCTTCA 
1751 CACTCATCCG GGACGCGACC CCTTAGCGGC CGCGCGCTCG CCCCGCCCCG 
1301 CTCCACCGCG. GCCCCGTACG CGCCGTCCAC ACCCCTGCGC GCCCGTCCCG* 
13 51 CCCGCCCGGG GGATCCCGGC CGTGCTGCCX CCGAGGGGGA GGTGTTCGCC 
1901 ACGGCCGGGA GGGAGCCGGC AGGCGGCGTC TCCTTTAAAA GCCGCGAGCG 

19 51 CGCGCCAGCG CGGCGTCGTC GCCGCCGGAG TCCTCGCCCT GCCGCGCAGA 
2001 GCCCTGCTCG CACTGCGCCC GCCGCGTGCG CTTCCCACAG CCCGCCCGGG' 

20 51 ATTGGCAGCC CCGGACGTAG CCTCCCCAGG CGACACCAGG CACCGGAGCC 
2101 CCTCCCGGCG AAAGACGCGA GGGTCACCCG CGGCTTCGAG. GGACTGGCAC 
2151 GACACGGGTT" GGAACTCCAG ACTGTGCGCG CCTGGCGCTG TGGCCXCGGC" 
2201 TGTCCGGGAC AAGCTAGAGTT CGCGGACCGA CGCTAAGAAC CGGGAGTCCG. 
2251 GAGCACAGTC TTACCCTCAA TGCGGGGCCA CT CTG ACCC A GG AGTGAGCG 
23 01 CCCAAGGCGA. TCGGGCGGAA GAGTGAGTGG ACCCCAGGCT GCCACAAAAG 
23 51 ACACTTGGCC: CGAGGGCTCC GAGCGCGAGG, TCACCCGGTT TGGCAACCCG' 
2401 kGXCGCGCGG: CTGGACTGTC TCGAGAATGA. GCCCCAGGAC GCCGGGGCGCT 
2451 CGCAGCCGTG^ CGGGCECTGC TGGCGAGCGC TGATGGGGGTT GCGCCAGAGH 
2501 CAGGCTGAGG^ GAGTGCAGAGT TGCCGCCCGC: CCGCCXCCCA. AGATCTECGCT 
2551 TGCGCCCTTG" CCCGGACACG- GCATCGCCCA. CGATGG CZG CT CCCGAGCCA1T 
2601 GGGTCGCGGCT CCACGTAACG* CAGAACGTCC GTCCTCCGCC CGGCGAGTCC 
2651 CGGAGCCAGC CCCGCGCCCC GCCAGCGCTG GTCCCTGAGG CCGACGACAG 
1701 CAGCAGCCTT GCCTCAGCCT" TCCCTTCCGT* CCCGGCCCCG CACTCCTCCC 
2751 CCTGCTCGAG GCTGTGTGTC AGCACTTGGC TGGAGACTTC TTGAACTTGC 
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28 01 CGGGA'jAGTG ACTTGGGCTC CCCACTTCGC GCCGGTGTCC TCGCCCGGCG 
2S51 GATCCAGTCT TGCCGCCTCC AGCCCGATCA CCTCTCTTCC TCAGCCCGCT 

2 9 01 GGCCCACCCC AAGACACAGT TCCCTACAGG GAGAACACCC GGAGAAGGAG 

29 51 GAGGAGGCGA AGAAAAGCAA CAGAAGCCCA GTTGCTGCTC CAGGTCCC7C 

3 0 01 GG AC AG AG CT TTTTCCATGT GGAGACTCTC TCAATGGACG TGCCCCCTAG 
3 051 TGCTTCTTAG ACGGACTGCG GTCTCCTAAA GGTAGAGGAC ACGGGCCGGG 
3101 GACCCGGGGT TGGCTGGCGG GTGACACCGC TTCCCGCCCA ACGCAGGGCG 
3151 CCTGGGAGGA CTGGTGGAGT GGAGTGGACG TAAACATACC CTCACCCGGT 
3 201 GCACGTGCAG CGGATCCCTA GAGGGGTTAG GCATTCCAAA CCCCAGATCC 
3 251 CTCTGCCTTG CCCACTGGCC TCCTTCCTCC AGCCGGTTCC TCCTCCCCAA ■ 
3 3 01 GTTTTCGATA CATTATAAGG GCTGTTTTGG GCTTTCAAAA AAAAAAATGC 
3 3 51 AGAAATCCAT TTAAGAGTAT GGCCAGTAGA TTTTACTAGT TCATTGCTGA 
3 401 CCAGTAAGTA CTCCAAGCCT TAGAGATCCT TGGCTATCCT TAAGAAGTAG 
34 51 GTCCATTTAG GAAGATACTA AAAGTTGGGG TTCTCCATGT GTGTTTACTG 
3 501 ACTATGCGAA TGTGTCATAG CTTACACGTG C ATT CAT AAA CACTATCTAT 
3 551 TAGTTAATT GCAGGAAGGT GCATGGATTT CTTGACTGCA CAGGAGTCTT 
3 601 GGGGAAGGGG GAACAGGGTT GCCTGTGGGT CAACCTTAAA TAGTTAGGGC 
3 651 GAGGCCACAA CTTGCAAGTG GCGTCATTAG CAGTAATCTT GAGTTTAGCG 
3 7 01 CTTACTGAAT CTACAAGTTT GATATGCTCA ACTACCAGGA AATTGTATAC 
3 751 AGCGCCTCTA AGGAAGTCAC TTGTGCATTT GTGTCTGTTA ATATGCACAT 
3 801 GAGGCTGCAC TGTATAAGTT TGTCAGGGAT GCAGTGTCCG ACCAACCTAT 
3 851 GGCTTCCCAG CTTCCTGACA CCCGCATTCC CAGCTAGTGT CACAAGAAAA 

3 9 01 GGGTACAGAC GGTCAAGCTC TTTTTAATTG GGAGTTAAGA CCAAGCCCCA 
39 51 AGTAAGAAGT CCGGCTGGGA CTTGGGGGTC CTCCATCGGC CAGCGAGCTC 
4001 .TATGGGAGCC GAGGCGCGGG GGCGGCGGAG GACTGGGCGG GGAACGTGGG 

4 051 TGACTCACGT CGGCCCTGTC CGCAGGTCGA CCATGGTGGC CGGGACCCGC 
4101 TGTCTTCTAG TGTTGCTGCT TCCCCAGGTC CTCCTGGGCG GCGCGGCCGG 
4151 CCTCATTCCA GAGCTGGGCC GCAAGAAGTT CGCCGCGGCA TCCAGCCGAC 

42 01 CCTTGTCCCG GCCTTCGGAA GACGTCCTCA GCGAATTTGA GTTGAGGCTG 
4 251 CTCAGCATGT TTGGCCTGAA GCAGAGACCC ACCCCCAGCA AGGACGTCGT 

43 01 GGTGCCCCCC TATATGCTAG ATCTGTACCG CAGGCACTCA GGCCAGCCAG " 

43 51 GAGCGCCCGC CCCAGACCAC CGGCTGGAGA GGGCAGCCAG CCGCGCCAAC 
4*401 ACCGTGCGCA CGTTCCATCA CGAAGGTGAG CGGGCGGCGG GTGGCGGGGC 

44 51 GGGGACGGCG GGCGGGCGGA GACTAGGCGG GCAGCCCGGG! CCTCCACTAG 
4 501 CACAGTAGAA GGCCTTTCGG CTTCTGTACG GTCCCCTCTG TGGCCCCAGC 
4 551 CAGGGATTCC CCGCTTGTGA GTCCTCACCC TTTCCTGfiCA AGTAGCCAAA 
4 601 AGACAGGCTC CTCCCCCTAG AACTGGAGGG AAATCGAGTG ATGGGGAAGA 
4 651 GGGTGAGAGA CTGACTAGCC CCTAGTCAGC ACAGCATGCG AGATTTCCAC 
4-701 AGAAGGTAGA GAGTTGGAGC TCCTTAAATC TGCTTGGAAG CTCAGATCTG 
4 751 TGACTTGTGT TCACGCTGTA GTTTTAAGCZ AGGCAGAGjCA AGGGCAGAAT 
4 801 GTTCGGAGAT AGTATT AG CA AATCAAATCC" AGGGCCTCAA AGCATTCAAA 
4 8 51. TTTACTGTTC ATCTGGGCCT AGTTTGAAAG" ATTTCTGAAT.* CCCTATCTAA 
4901 TCCCCGTGGG' AGATCAATTC CACAATTCGT CATATTGTTT CCACAATGAC 
^951 CTTCGATTCT TTGCTTAAAT" CTTAAATCTC CAAGTGGAGA. CAGCGCAACG 
50 Or CTTCAGATAA AAGCCTTTCT CCCACTG CCT GCTACCTTCCT TAGGCAAGGC 
5051- AATGGGGTTTT TTAAACAAAT" ATATGAATAT.* GATTTCCCAA. GATAGAATAA 
5101. TGTTGTTTAT TTCAGCTGAA ATTTCCTGGA TTAGAAAGGC TGTAGAGGCC 
5151. TATTGAAGTO TCTTGCACCG: ATGTXCIGAA AG CAGTTAGH AAAAAATCAT 
5203. GACCTAGCTC AATTCTGTGi: GTGCCACmr CAATGTGCTTT TXGACITAAT" 

52 5 L GTATTCTCCA. TAG AACATCA . GTTCCTTCAA1 GTTCTAGAAG.' AATICAGATT 

53 01_ TAAAGTTTTC CTTTGCCITG" CTGAGGGGAT AAATTTTAAGT TAGAAATCTA 
5351- GGCTCTGAAA* TGATAGCCCA- ACCCCATCTC CAGTAAGGGA" TGACTGACTC 
5401 AAACCTTGAC AAGTCFGGGT" GATAATAGGA AAAGTCCAGA. AGCAGGTCAC 

5451 agagcgcgag: atggatctgt cttgaggcag CCAATGGTTA TGAAGGGCAC 

5501 m GGAAATCCA TCTCTTTCAA ACTGGTGTCT AGGGCTTTCT" GGGAGCAAAG 
5 551 CTTAGACCAC ATTCTGCTCC TCAAGGTTTG CCTACTGAAA GCAGGGAGAC 
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5601 TCTGGGTGTT CACCCCCATC CTTCACCCCC AGGTGATTCT GGGCTTAGCT 

5 651 AATCTCTCCT GGTTAATATT CATTGGAAAG TTTTTATAGA TCAAAACAAA 
57 01 CAAACCTACT ATCCAGCACA GGTGTTTTTC CCACTGCCTC TGGAGATATA 
57 51 GCAAGAAAAC CATATATTCA TGTATTTCCT TATTAGTCTT TTCTAACGTG 
52 01 AAAATTATTC CTGACCTATA AAAAATGAAG GAGGTATTTT ATCTTAACT A ' 
55 51 AGCTAAAAGA ATCGCTTAAG TCAATTGAAA CTCAAAAATC CAATTGAATG 
59 01 AAAGGTTCGT CAATAAAAAT CTACATTTTT CTTACTCTTC CTTTGGAAAT 
59 51 AG CTTG ATAA AAACACAGAC AAAACAAAGT CTGTGTGCTT ATTTGAAAAC 
6001 TTAGTGAGCT TCAGTTCATA AGCAAAAAAT GTAGTTTAAA AGTGATTTTT 
6051 C7CTTGTAAA ACGTGATAGA AGTTATTGAC TTGTTTAAAA TAAACTTGCA 
6101 CTAACTTTAT ACCTTGGTGC AATTAGATGT AATGTTTACT GTAAATTTCA 
6151 GGAAAACCAT TTTTTTTTTT TGGTCATGAT CAGGTACACA TGGCATTTGG 

62 01 GAAGACTTTT CACATTGTTG AGTAACCTAG AGTTTGTTTG TTTGTTTGTT 

6 2 51 TGTTTTTAAG CATTCTTGTG CC ACT AG AAA AACCTTAATA AGCCATGTGT 

63 01 TACTTGGTAG ACTTCTTCCT AAGTTCTAGA AAGTGGCTTA ATGCCACGAT 
6 3 51 GAGACAAAAC ATACCATAGT AGTCTTTCAA CCAGTGGCAG AGTCTTCCAG 

64 01 ACAAAATCTC CTGTTGAACA TTAAGACCAT GG ATTTTT AT CCAGGAGAGC 

64 51 CCAGGCTTTG CTGAATCACC ACCCTCCAAC CCCACTCCAA GGTCACCGAA 
6501 GGCCTCCCCA ACTGGCTGCC ATTGAGAAAC TGTTTGAAAT TGATTGACTC 

65 51 CATTGGCCCT ACAGAGACTT CTCCTTTAGT GGCAGATCAT ATACTGAAGG 
6601 ATCCAAGCTT GCTCTTCTGA CTATGAAGAG CACAGTCT21T CTTTTTCTTT 
6 651 ATGGAATAAA CAAACTATGT GGCCCTGTGA CTAAAGTTTT CAAAGAGGGA 
6701 GAGATCCTGT TAGCAGAAGT GCAACTGCCC AGAAACTAGC CACAGGCTAG 
67 51 G AT ATTCCAA AGTACAACTC TAAAGTATGG TCCATCCTAA ATTCTAGCAT 
6301 GGGGTTG AAT ACCGGCATCC AGGAATACTT CTCTCTACCT CTGGCTATTG 
63 51 CAGTGAGATT ACGAAGACCC TGGGGGGAAA AACAGTTGCT TAGTTTACAG 
6901 ATGTTCCTTG CCACAGATGT TCTCAGTATC TCTTGTTTGT CAGAGGATCC 
6951 TTTCAATCCC TCTTGACATT TCCAATCTGC TTTTGTCCTC TCTACATGTG 
7001 CCTTGTGGCA TTTCGCTTGG TCTTTAGAGA ATCCCTTTCT GGAGCTGCAG 
70 51 GTTCCCTTGT AGGATCTGTG TTCAGGAGAA CAGGGACCTT GGCAGGTTAG 
7101 TGACAACTAC CAAACCCTGC TTTCCTTCCC TGCCACTTCC TTTGTTGCCT 
7151 TAAAAATTAA ACCTTAACTC. TCTGTGTCTA AACCTTTTCT TCTTCCTCTT 
72.01 TGTCATTTAC TTTATTTATT* TGTCATGTAC TTTATCCTGT AGAAAATCAC 
7251 AGTGTGGCCC" AAAGCCCCTT GAATCTTGTT GCAGCGGTGA GATGCAGCTG 
73 01 CTGATCTGGA ATAGCCTTAG" GCTGTGTGTT: TGATCACAAT' GCTTTCTGTC 

73 51 CAAAAGTGTG* CAAATCCTCC AAGCTTAATG ATAACTTTTG AAATGAAACT 

74 01 CACCCTACTT TAGGGCAAAC" AAGTAGCCAC AGAGAGCAGG ATCTAAACAA 
74 51 GGTCTGGTGT CCCATTTGGC TGTGTCCCTH CAATTTTCTG TTCATTTAGC 
7501 TCTGTCTGCA TCTAAAGGGT GCTGGGCAAT' AAGTTTTGAT" CTTCAGGGCA 
7551. AAA CT CAATC TTCAGTTACC ATGGTATCAG. GTACCAATTC CTAGTGATTT' 
T6 01 GTGCTATGGC: TTAGGATTTG* ATTTCTCTCC" TACATTAGGT AATATCTTTC 
7651 AATGGCTAGA. ACXTGGGCATT TGCAGTACAC TCAAGTTAAC: AGTTCTGTGA 
7701 CCTAAGGAAG* TCACATAACC TCTCTGAATTT CTCTACTGTT" TCATTCACAA 
775 L AATGGAGAAA ATCATGGCTC TTTCETAATG" TG CG AATTC A TAGAAAGGTG* 
7S0 1 ATGACACCAG. ATTTGGCAGA^ AGGAAGGAAA. GGAAGGAAGGI AAGAAAGAAA^ 

78 51 GAAAGAAAGA AAGAAAGAAA. GAAAGAAAGA AAGAAAGAAA GGAAGGAAGG" 

79 01 GAG A GAG AG A GAAGGGAAGG^. GAAAGGGAAA GGG AAAGGAA AGAAAAGAAA 
7951. GG AAGGAAGA- AAAGGAAGGA^ AGGAAGGAAA, GAAGGAAGGA^ AGGAAAAGAAL 
8 0011 AGAGAAGAAA-. GCATXCAGCA, TATGAACTAA. TGTTTCCXGG: TGACTTTTTAl 
80531* TATCATATCC TTGTTCIAG^ AAGTGGCCCT AGCCATATCF TTTGGGTTAT: 
8 10 1 TTTGAGGTAC^ AGGATAATCAL ACATAGTGTA, G AACATTAAA^ TCTGGGCTTir 
8153- GTTTCTAGAA- G AG G CT AG AA- TGGCATGGCTT GTCCCACTT& CZCCTCTTTC: - 
3201 AGGCAGTATGT GCAGCCACCA TTCTCTCTGT' AAGATCTAGCT AGGCTG A CACT 
3 251 TCAGGTTGGA GACAGGTCAG AATCCTGAAA TCACTTAGCA AGTTCAGCTG* 
33 01 ATTCAACAAG GGATATTTACT AGAGAATTAA CAGCTATTCC AGCTTCCAAA 
33 51 AAGTGTACAT TACCTACTCT GTATTTTCAG AACCCCAGGT TTGCTGTGAT 
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34 01 AATTTGGTAG AAGCCTTTTC CTGTAATTTT CTTTATTTAA AAGATATTTT 

3451 CATTTTCCAC CCTCAAGAAG AGGTTGAAAC TTGTCCCTTG AAGTAGAAGA 

8 501 GGTGTTGTGT GTCCTGACCC TCAGGAAGTT GGCCTTGTTG AGGTCTTCTG 
3 551 TAAATTCTTG AATTCTCTGT ATAATTTCAA TGAATAGTCA TGTTTGATAC 
a 501 CTTGGTATAA AGGATGGGAT AAGATCTTTC AAGGCTTAGG CTGATGGAAA 
3 651 CGCTGCTGAA AG ACT AG AG A TTGCTCTTTC CTTTGGCATC TGTCTTGGGT 
3701 AGTAATATTG TTCTCTGTGA AGGCCCACTT ATTCTGTCTT GAAAATTCTT 
3751 CTTACCTCCA GAGTGATAGG CCACAGGGAG TACTGTTTCT ATGTTTGCAG 
38 01 TTGAAAGATG ACAATTTCAT ATGGTCCAAA CTTGGCTTTA TTTCTTGGTG 
3851 AG AT ATT ATT CTGTTACTTC AATGACCTGT CTCCATTATT TATCTTGAGG 
3901 CTCACCTCTT CCCTTTTGTT GACTGTTGTG CAATTTGTGG AAGGCCCTGG 
3951 GTAGTCAGCC TTTATACTCT GTCTGTACAG GAAATAAAGT GCATGTCACC 
9001 ATGCCAAAGT CAGGAGATGC CGGTGTGATT AGGGTCCACG GGATTTTGCT 
9051 ACTGTTTTTA TTTCTATCGA TGAATTGCCT TAGGCAGAAA CATTAAGGGA 
9101 CACCAGAATG GTGATGAAAG GCTTTTTATA ACAGAAGCTA AATGCAGTCC 
9151 TTCATACTTC ATGGAATGCC CCTGTCCTAA AGTACCATTA ACCGATAGTG 
9201 GAGTCAGAAC ATAAATGGCT CCCCAAAGGT ATCACCAAGA ACTTTTGGCA 

9 251 AACAGATGCA AG AG G ATT AT GAAGAATCGC AGCTTGGTCT GGTAATCTTC 
93 01 CTGTTGCAAA GAGAAGAGCT TTAGAAGACC CCCCTTGAGT CCCTGGCTGG 

93 51 CTTAACATAG CATGAACCCT CATGTGTTGG CCAACATTAA GGCTTTTTCT 
9 401 ATAAAAGTCT CCTCCTTCAT CAGTATACGC TCGAGTATGA AAAGCATCCT 

94 51 TTTAAACCTT GACTCTGTGT GGTCCAGAAA CAGCAGCATC CCTTGCTTAA 
9501 GAGCTTAATG GAGATGCAGG AGTGCAGGCC TCTTCCCAGA CCGGCTG'ATG 
9 551 TGCAGGTCAA AGTCTAAGCA CTGCTGGATC AACACAGAAG TTATTCCGAA 
9 601 TGAGGATGAG ATGGATACGA GAGAACAGGA AGTAGGAAGG GATTTCTTTA 
9 651 TCGTGAATTG CTACAGCAGC CTAATGTCAC CCCATACCCT TCTGAAGAAC 
9701 TATGTCCCTG TGGATGCCTT TGTCTCTAGA GTTCTGAGCA AAATG GTAGG 

97 51 GTGTGCTTTG CAAAATGTCA TCATTGATGT TGAATTTCAA AGTCTTTAAT 

98 01 TAAGGGGCTG AAATCTGTAT ATTGAGATTT GTAAATCATC TAAATTGTAG 
9 851 AGTAATGTTT GCACAGGCTG CTTAAGGGAT TGACATTAAA GCTCGTTTTC 
9901 TTAGTTAAGA AATACAGTCA TTTCCTCAAC TCCTCAGTCA TTAGCTCTCT ' 
9951 ACTAAGTACA GTGCTGACTT ' TTTTAAAATT AAAGTCTGTG AATTCCAAAG 

10001 AAGTGTTTCA CTATTTCCTC CATTATTATA GCTACCTAGA AGCTATGTTC 

10051 ATATATTGGA TTAAAAACGT AGCAATTACA AAGTTAATGT GGCCATATAG 

10101 AAAAGGGAAA AG AAACTCCG CTTTCACTTT AATATATATA TGTGTGTGTG 

1015L TATATCATAT ATATACATGT TGTGTGTGTA TATATATATA TATATATATA 

10201 TATATATATA TATATATATA TATATATATA TGTTGTGTTA AGCAGTAAAC 

10251 TCAGGCCATG GACAGAGGGG CAGACATTGT ATCTCTAGGC CTGACATTTT 

103 01 TAATTTCTGG TTGCAGGTTT TTATGTAGTT TAACTTAAAC CATGCACTGA 

103 51 AGTTTTAAAH GCTCGTAAGG AATTAAGTTA CCATTGGCTC TCTTACCAAA 

104 01 TGCGTTTCTT* TTTTCTCTCC ACCCTGATCA AACTAGAAGC CGTGGAGGAA 
104 5]- CTTCCAGAGA TGAGTGGGAA AACGGCCCGG. CGCTTCTTCT" TCAATTTAAG 
10501 TTCTGTCCCC* AGTGACGAGT TTCICACATC TGCAGAACTC CAG ATCTTCC 
10551 GGGAACAGATT ACAGGAAGCT TTGGGAAACA GTAGTTTCCA GCACCGAATT 
10601- AATATTTATG AAATTATAAA GCCTGCAGCA GCCAACTTGA AATTTCCTGT" 
10651 GACCAGACTA TTGGACACCA GGTTAGTGAA TCAGAACACA. AGTCAGTGGG. 
10701 AGAGCTTCGA CGTCACCCCA GCTGTGATGC GGTGGACCAC ACAGGGACAC 
10751. ACCAACCATG: GGTTTGTGGTT GG AAGTGGCC CATTTAGAG<r AGAACCCAGff 

108 or tgtctccaag: agacatgtga. ggattagcag: gtctttgcac: caagatgaac: 

108 5]. ACAGCTGGTC ACAGATAAG&* CCATTGCTAG" TGACTXTTGG: ACATGATGGA. 
10901 AAAGGACATC CGCTCCACAA ACGAGAAAAC CGTCAAGCCA. AACACAAACA 

109 5L GCGGAAGCGC CTCAAGTCCA. GCTGCAAGAGI ACACCCTTTG:* TATGTGGACE 
11001 TCAGTGATGT GGGGTGGAAT GACTGGATCG" TGGCACCTCC GGGCTATCATT 

110 51 GCCTTTTACT GCCATGGGGA GTGTCCTTTT CCCCTTGCTG ACCACCTGAA 
11101 CTCCACTAAC CATGCCATAC TGCAGACTCT' GGTGAACTCT GTGAATTCCA. 
11151 iAATCCCTAA GGCATGCTGT GTCCCCAC-.G AGCTCAGCGC AATCTCCATG 
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112 01 TTGTACCTAG ATGAAAATGA AAAGGTTGTG CTAAAAAATT ATCAGGACAT 

112 51 GGTTGTGGAG GGCTGCGGGT GTCGTTAGCA CAGCAAGAAT AAATAAATAA 

113 01 ATATATATAT ITT AG AAA C A GAAAAAACCC TACTCCCCCT GCCTCCCCCC 

113 51 CAAAAAAACC AGCTGACACT TTAATATTTC CAATGAAGAC TTTATTTATG 

114 01 GAATGGAATG AAAAAAACAC AGCTATTTTG AAAATATATT TATATCGTAC ' 
114 51 G AAAAG AAGT TGGGAAAACA. AATATTTTAA TCAGAGAATT ATTCCTTAAA 
11501 GATTTAAAAT GTATTTAGTT GTACATTTTA TATGGGTTCA ACTCCAGCAC 
11551 ATGAAGTATA AGGTCAGAGT TATTTTGTAT TTATTTACTA TAATAACCAC 
11601 TTTTTAGGGA AAAAAGATAG TTAATTGTAT TTATATGTAA TCAGAAGAAA ' 
11651 TATCGGGTTT GTATATAAAT TTTCCAAAAA AGGAAATTTG TAGTTTGTTT 
11701 TTCAGTTGTG TGTATTTAAG ATGCAAAGTC TACATGGAAG GTGCTGAGCA 
11751 AAGTGCTTGC ACCACTTGCT GTCTGTTTCT TGCAGCACTA CTGTTAAAGT 
llo 01 TCACAAGTTC AAGTCCAAAA AAAAAAAAAA AG G AT AATCT. ACTTTGCTGA 
113 51 CTTTCAAGAT TATATTCTTC AATTCTCAGG AATGTTG C AG AGTGGTTGTC 
119 01 CAATCCGTGA GAACTTTCAT TCTTATTAGG GGGATATTTG GATAAGAACC 
11951 AGACATTACT GATCTGATAG AAAACGTCTC GCCACCCTCC CTGCAGCAAG 
12 001 AACAAAGCAG GACCAGTGGG AATAATTACC AAAACTGTGA CTATGTCAGG 
12 051 AAAGTGAGTG AATGGCTCTT GTTCTTTCTT AAGCCTATAA TCCTTCCAGG 
12101 GGGCTGATCT GGCCAAAGTA CTAAATAAAA TATAATATTT CTTCTTTATT 
12151 AACATTGTAG TCATATATGT GTACAATTGA TTATCTTGTG GGCCCTCATA 
12201 AAGAAGCAGA AATTGGCTTG TATTTTGTGT TTACCCTATC AGCAATCTCT 

122 51 CTATTCTCCA AAGCACCCAA TTTTCTACAT TTGCCTGACA CGCAGCAAAA 

123 01 TTGAGCATAT GTTTCCTGCC TGCACCCTGT CTCTGACCTG TCAGCTTGCT 

123 51 TTTCTTTCCA GGATATGTGT TTGAACATAT TTCTCCAAAT GTTAAACCCA 
12 401 TTTCAGATAA TAAATATCAA AATTCTGGCA TTTTCATCCC TATAAAAACC 

124 51 CTAAACCCCG TGAGAGCAAA TGGTTTGTTT GTGTTTG C AG TGTCTACCTG 
12501 TGTTTG C ATT TTCATTTCTT GGGTGAATGA TGACAAGGTT GGGGTGGGGA 
12 551 CATGACTTAA ATGGTTGGAG AATTCTAAGC AAACCCCAGT TGGACCAAAG 
12601 GACTTACCAA TGAGTTAGTA GTTTTCATAA GGGGGCGGGG GGAGTGAGAG 

12 651 AAAGCCAATG CCTAAATCAA. AGCAAAGTTT GCAGAACCCA AGGTAAAGTT 
12701 CCAGAGATGA TATATCATAC AACAGAGGCC ATAGTGTAAA AAAATTAAAG' 
12751 AATGTCTGAT CAGCGTCTCA GCACATCTAC CAATTGGCCA GATGCTCAAA 
12 SOL CAGAGTGAAG TCAGATGAGG' TTCTGGAAAG TGAGTCCTCT* ATGATGGCAG 
12851 AGCTTTGGTG CTCAGGTTGG. AAGCAAAACC TAGGGAGGGA GGGCTTTGTG 
12901 GCTGTTTGCA GATTGGGGAA TCCAGTGCTA GTTCCTGGCA GGGTTTCAGG 
129 5 L TCAGTTTCCG GAGTGTGTGT CCTGTAGCCC TCCGTCATGG TTGAAGCCCA 
13001 GGTCTCACCT CCTCTCCTGA CCCGTGCCTT" AGAACTGACT TGGAAAGCGG 

13 051 TGTGCTTACA GCAAGACAGA CTGTTATAAT TAAATTCTTC CCAAGGACCT 
13101 CCGTG CAATG* ACCCCAAGCA. CACTTACCTE CGGAAACCTT AAGGTTCTGA 
13 151 AGATCTTGTT* TTAAATGACT' ACCCTGGTTA G CTTTTG ATG" TGTTCCXTAT' 
13 201 CCCTTTAGTr GTTGCACAGG TAGAAACGAT" TAGACCCAAC TATGGGTAGC 
13 251 CTTGTCCTCC TGGTCCTTCA GTCATTCTCT AATGTCTCT11" GCTTGCCATG 
13 3 0L GGCACTGTAA. CAAACTGCAA TCTTAACATCT'TTATAAAATG" AATGAACCAC 
13 3 5L ATATTTACAT CTCCAAGTCC TCCAGATGGG" AGTGCGATCA TTCCATAAGG* 
13 4 OL ATCCCACCTT CTGGCAGGTC TATCCAGTAC ATATXTTATG*. CTTCATTGGT: 
13 4 51 CTTGATTTTCr TTGGCTAAAA. TTACTTGTAGT CACAGCAGGC CCCATGTGAC 
13 501 ATATAGGTAX ATACATACAT GTATGTG CATT ATAGTGTGTA CATGTTCTAA 
13 551- TTTATACATA* GCIATGTGAA. GATTATGTTAL CATAIGTAGA^ TGGTCG CACTT 
13 60 1 TCTGATTTCCT ATTTAGGTTC: AGAGAGAGACT GTCACAGTAA: ATGG AG CTAT 
13 6 51. GTCATTGGTA. TAICCCCGAG" TGGTXCAGGT GTTCECTCTA- TTTTTTTAAC^. 
13701. ATGGAGAACA, CTCATCTGTA, CTATCGAAAA1 CTGAGCCAAA; XCACTTAGCA 
1375]_ AATTTCTAGT CACXGCCIT<£ CTGTTAAGATT ACTGATTCAC TGGGTGCTGA 
13801 CATGCTGAGC CCTGCCTACIT* TTTGCATGAA GGACAAGGAA, GAGAGCTTGC 
13 851 AGTTAAGAAT' GGTATATGTG GGGCTAGGGGI GCGGCGTATA. GACTGGCATA 

12 9 01 TATGTGAAGG AAGGTCACAA ACAGCCTGCA CTAATTTCCC TTTTCTGGTT 

13 9 51 TTATGTCTTG GCAGGGGAAA GGACAGGTAG GGTGGGGTTG AGGGGGAGGG 
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14 001 CACACACATC TACTTGGATA 

14 051 ACCATATCTT AAAGCCTTAT 

14101 GCTCCCCGCC CTACCAACTT 

14151 CAAAGTTCTA AATAAAACTT 

14201 GGAATGCCTC CAGGAAAGCA 

14 251 GGAGTCCTGC CACCCTATGA 

14301 GATCCTGAAT TTCTAATGAG 

14 3 51 AGGACTGCAG TTTGAGTAGC 

14401 ACAG AAGCTG GCTCAAGTAC 

14451 TGAGAAGAAA CAGAAGGCAA 

14 501 CCTGCAGTTT TGTTTTTTGT 

14551 AACAAGATGA GTGGCAATCT 

14601 ACTGGGTTGC TTATCTTGTA 

14 651 CTGCAGTCAG TTTAGTCAAA 

14701 AGCAGGTGCA AGTCCTTAGA 

14751 TCAGTGTGCT GAGCTGCTCC 

14801 AAGGGATCTC TTTGAAGGCA 

148 51 GTTTATACTA AAATGATGCT 

149 01 ACTGTATAAG TTCATTGAAC 
14951 TATGGGAGGT TTGTTCTAAT 
15001 GG AATAAAGT GCTTATGTGA 
15051 TCCCTCTGCA AAGAGGGTCA 
15101 ATGAATGCCA CTTGTTAGCA 



AATTGCATCT CCTCTTTCCT TCACCCCGCC 
GACATCCTCT AGGGCAGAAT TTTCTCACCA 
CAAAGTGAAC TTCTAACTAA CTTGAGGGGC 
GTTAGAGTTT AGCGGGCACC TCAGTCATCA 
AAAAGCTTGA TGTGTGTACA GCCACGTGGT " 
TTCCTGTCCC AGTGGTCGTG TGGGGCCTGA 
CTCCCAGTAC GCCCTGACTC ACTGTGCCAG 
AAGGTTGTGT GACTGTCTTC GATCATGGCT 
AGCCCTTCGT GTGTAAAAGC CATGTGTAAA 
AGCTGCGTTG CATGGCATCT GAATCAGTGC 
TTTTTTTTTT TCAAAGACAT TCtTTTTCCC 
TATGTTCTAG CCACTCTTAG ACATGAAAAC 
AAATCTGCTC TGCTTGCTTG CTTGGGCACG 
TGCGTGTCAG TACATCTATA TGTATGAGGG 
AATGTACTTT AAAAAACTTG AACACTTAAC 
TGTGTGATGT TAGGCCAAGC ACCTGAGTTA 
GAGGGTAGAT GTCGTATGGT TGAAGCATTT 
TGACTTTTTT TCTAAGTTAT AAGACAGTAC 
. CTAGAGGGTG GCATAGGACT CCAAATCTGG 
GGAAGTTCGA ATCTTTTTTG CAGTTGGCTT 
ATGGGCTTAA GCTAGGGAAA AAAATGGGTT 
GCACAGAAAT AACTTCCTGG CTTTGCTTGC 
GATGCCCTGT GGGGATCCGA ATTC 
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1 GAATTCGCTA GGTAGACCAG GCTGGCCCAG AACACCTAGA GATCATCTGG 
51 CTGCCTCTGT CTCTTGAGTT CTGGGGCTAA AGCATGCACC ACTCTACCTG 
101 GCTAGTTTGT ATCCATCTAA ATTGGGGAAG AAAGAAGTAC AGCTGTCCCC 
151 AGAGATAACA GCTGGGTTTT C C CATC AAA C ACCTAGAAAT CCATTTTAGA 
2 01 TTCTAAATAG GGTTTGTCAG GT AG CTTAAT TAGAACTTTC AGACTGGGTT 

2 51 TCACAGACTG GTTGGGCCAA AGGTCACTTT ATTGTCTGGG TTTCAGCAAA 

3 01 ATGAGACAAT AG CTGTT ATT C AAAC AAC AT TTGGGTAAGG AAGAAAAATG 

3 51 AACAAACACC ACTCTCCCTC CCCCCGCTCC GTGCCTCCAA ATCCATTAAA 

4 01 GGCAAAGCTG CACCCCTAAG GACAACGAAT CGCTG CTGTT TGTGAGTTTA 
4 51 AATATTAAGG AACACATTGT GTTAATGATT GGAGCAGCAG TGATTGATGT 
501 AGTGGCATTG GTGAGCACTG AATCCGTCCT TCAACCTGCT ATGGGAGCAC 
551 AGAGCCTGAT GCCCCAGGAG TAATGTAATA GAGTAATGTA ATGTAATGGA 
601 GTTTTAATTT TGTGTTGTTG TTTTAAATAA TTAATTGTAA TTTTGGCTGT 
651 GTTAGAAGCT GTGGGTACGT TTCTCAGTCA TCTTTTCGGT CTGGTGTTAT 
701 TGCCATACCT TGATTAATCG GAGATTAAAA GAGAAGGTGT ACTTAGAAAC 
751 GATTTCAAAT GAAAGAAGGT ATGTTTCCAA TGTGACTTCA CTAAAGTGAC 
801 AGTGACGCAG GG AATCAATC GTCTTCTAAT AGAAAGGGCT CATGGAGACC 
851 TGAGCTGAAT CTTTCTGTTC TGGATGAGAG AGGTGGTACC CATTGGAATG 
901 AAAGGACTTA GTCAGGGGCA ATACAGTGTG CTCCAAGGCT GGGGATGGTC 
951 AGGATGTTGT GCTCAGCCTC TAACACTCCT TCCAACCTGA CATTCCTTCT 

1001 CACCCTTTGT CTCTGGCCAG TAGAATACAG GAACTCGTTC CTGTTTTTTT 
1051 TTTTTTAAAT TCTGAAGGTG TGTAAGTACA AAGGTCAGAT GAGCGGCCCT 
1101 AGGTCAAGAC TGCTTTGTGG TGACAAGGGA GTATAACACC CACCCCAGAA 
1151 ACCAAGAACC GGAAATTGCT ATCTTCCAGC CCTTTGAGAG CTACCTGAAG 
1201 CTCTGGGCTG CTGGCCTCAC CCCTTCCCTG CAGCTTTCCC TTTAGCAGAG 
1251 GCTGTGATTT CCTTCAGCGC TTGGGCAAAT ACTCTTAGCC TGGCTCACCT 
13 01 TCCCCATCCT CGTTTGTAAA AACAAAGATG AAGCTGATAG TTCCTTCCCA 

13 51 GCTCCATCAG AGGCAGGGTG TGAAATTAGC TCCTGTTTGG GAAGGTTTAA 

14 01 AAGCCGG CCA CATTCCACCT CCCAGCTAGC ATGATTACCA ACTCTTGTTT 

14 5 L CTTACTGTTG TTATGAAAGA CTCAATTCCT CATCTCCCTT TCCCTTCTTT 
1501 TAAAAAGGGG CCAAAGGGCA CTTTGTTTTT TTCTCTACAT GGCCTAAAAG 

15 5 L GCACTGTGTT ACCTTCCTGG AAGGTCCCAA ACAAACAAAC AAACAAACAA 
1601 AATAACCATC TGGCAGTTAA GAAGGCTTCA GAGATATAAA TAGGATTTTC 

16 5 L TAATTGTCTT ACAAGGCCTA GGCTGTTTGC CTGCCAAGTG CCTGCAAACT 
1701. ACCTCTGTGC ACTTGAAATG TTAGACCTGG' GGGATCGATG GAGGGCACCC 
1.7 5 1 AGTTTAAGGG GGGTTGGTGC AATTCTCAAA TGTCCACAAG. AAACATCTCA 
18 01 CAAAAACTTT TTTGGGGGGA AAGTCACCTC CTAATAGTTC AAGAGGTATC 

18 51 TCCTTCGGGC ACACAGCCCT GCTCACAGCC TGTTTCAACG TTTGGGAATC 

19 OL CTTTAACAGT TTACGGAAGG CCACCCTTTA AACCAATCCA ACAGCTCCCT* 
19 5 L TCTCCATAAC: CTGATTTTAG AGGTGTTTCA TTATCTCTAA TTACTCGGGG 
2001 TAAATGGTGA TTACTCAGTG* TTTTAATCAT: CAGTTTGGGC AGCAGTTATT 
2051_ CTAAACECAG. GGAAGCCCAC. ACTCCCATGG* GTATTTTTGG" AAGGTACAGA 
2L01. GACTAGTTGG" TGCATGCEXT CTAGTACCtC TTGCATGTGG* TCCCCAGGTG. 
215 L AGCCCCGGCZ GCTTCCCGAG. CTGGAGGCAT CGGTCCCAGC CAAGGTGGCA 
220 1. ACEG AGGG CH GGGGAGCTGT GCAATCTTCC GGACCCGGCC- TTGCCAGGCG 
225L AGGCGAGGCC CCGTGGCTGG* ATGGGAGGAH GTGGGCGGGG CTCCCCATCC 
23 OL CAGAAGGGGA. GG CG ATT AAG" G G AGGAGGG A AGAAGGGAGG" GGCCGCTGGG. 
23 51_ GGGAAAGACT G GGGAGG AAG* GGAAGAAAGA. G AG G GAGGG A- AAAGAGAAGG, 
Z4011 AAG G AGTAG A. TGTGAGAGGa* TGGTGCtGAC GGTGGGAAGG; CAAGAGCGCG; 
2.4 5L. AGG CCTGG CC CGGAAGCTAG" GTGAGTTCGGT CATCCG AG CTT GAGAGACCCC 
250 1. AGCCTAAGACT GCCTGCGCTG" CAACCCAGCC TGAGTATCTG; GTCTCCGTCC 
2551. CTGATGGGAT TCTCGTCTAA ACCGTCTTGGT AGCCTGCAGCT GAXCCAGTCT 
260 L CTGGCCCTCG". ACCAGGTTCA TTGCAGCTTT" CTAGAGGTCC CCAGAAGCAG 
265 L CTGCTGGCGA GCCCGCTTCT GCAGGAACCA ATGGTGAGCA GGGCAACCTG 
270L GAGAGGGGCG CTATTCTGAG GATTCGAGGT GCACCCGTAGT TAGAAGCTGG 
275 L GGhZlGGGGCT CAGGCTGTAA CCGAGGCAAA AGTTGGCCTA TTCCTCCTTC 
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2801 CTTCTCCAAC AGTGTTGGAG GTGGGATGAT GGAGGCTAAA AGGCACCTCC 

28 51 ATATATGTTA CTGCGTCTAT CAACCTACTT TAGGGAGGTG CGGGCCAGGA 

2901 GAGGCGGGAA GGAGAGAAGG CCTTGGAAGA GAGGTCATTG GGAAGAACTG 

2951 TGGGGTTTGG TGGGTTTGCT TCCACTTAGA CTATAAGAGT GGGAGAGGAG 

3 001 GGAGTCAACT CTAAGTTTCA ACACCAGTGG GGGACTGAGG ACTGCTTCAT 

3 051 TAGGAGAGAG AACCTAGCCA GAGCTAGCTT TGCAAAAGAG GCTGTAGTCC 

3101 TGCTTTGCTC TAAAGCGCGA CCCGGGATAG AGAGGCTTCC TTGAGCGGGG 

3151 TGTCACCTAA TCTTGTCCCC AACGCACCCC CTCCCAGCCC CTGAGAGCTA 

3 201 GCGAACTGTA GGTACACAAC TCGCTCCCAT CTCCAGGAGC TATTTTCTTA 

3 251 GACATGGGCA CCCATGATTC TGCCTTCTGG TACTCTCCCC TCCCTGGGAA 

3 301 AGGGGTGTAA GGTTCCGACG GAACCGTGGC CAGGATGCCG AAAGGCTACC 

3 3 51 TGTGCGGGTC TTCTGCCATG CTGTGTCTGT GCGGACATGC CAGCAGGGCT 

3 401 AATGAGGAGC TTGCGATACT CCAAAGGGTT CGGGAATTGC GGGGTCCTTA 

3451 CACGCAGTGG AGTTGGGCCC CTTTTACTCA GAAGGTTTCC GCCACGGCTT 

3 SOL TGGTTGATAG' TTTTTTTAGT ATCCTGGTTT ATGAACTGAA GGTTTTGTGA 

3 551 GATGTTGAAT CACTAGCAGG GTCATATTTG GCAAACCGAG GCTACTATTA 

3 601 AATTTTGGTT TTAGAAGAAG ATTCTGGGGA GAAAGTGAAG GGTAACTGCC 

3 651 TCCAGGAGCT GTATCAACCC CATTAAGAAA AAAAAAAATA CCAGGAGATG 

3701 AAAATTTACT. TTGATCTGTA TTTTTTAATT AAAAAAAATC AGGGAAGAAA 

3751 GGAGTGATTA GAAAGGGATC CTGAGCGTCG GCGGTTCCAC GGTGCCCTCG 

3801 CTCCGCGTGC GCCAGTCGCT AGCATATCGC CATCTCTTTC CCCCTTAAAA 

3851 GCAAATAAAC AAATCAACAA TAAGCCCTTT GCCCTTTCCA GCGCTTTCCC 

3 901 AGTTATTCCC AGCGGCGACG CGTGTCGGGG AATAGAGAAA TCGTCTCAGA 

3 951 AAGCTGCGCT GATGGTGGTG AGAGCGGACT GTCGCTCAGG GGCGCCCGCG 

4 001 GTCTCTGCAC CCAGGGCAGC AGTGTGGGAT GGCGCTGGGC AGCCACCGCC 
4051 GCCAGGAAGG ACGTGACTCT CCATCCTTTA CACTTCTTTC TCAAAGGTTT" 
4101 CCCGAAAGTG CCCCCCGCCT CGAAAACTGG GGCCGGTGCG GGGGGGGGGA 
4151 GAGGTTAGGT TGAAAACCAG CTGGACACGT CGAGTTCCTA AGTGAGGCAA 
4201 AGAGGCGGGG TGGAGCGGGC TCTGGAGCGG GGGAGTCCTG GGACTCGGTC 
4251 CTCGGATGGA CCCCGTGCAA AGACCTGTTG GAACAAGAGT TGCGCTTCCG 
4 3 01 AGGTTAGAAC AGGCCAGGCA TCTTAGGATA GTCAGGTCAC CCCCCCCCCC" 
43 51 AACCCCACCC GAGTTGTGTT* GGTGAATTTC TTGGAGGAAT CTTAGCCGCG 
4401 ATTCTGTAGC TGGTGCAAAA GGAGGAAAGG GGTGGGGGAA GGAAGTGGCT 
4451 GTGCGGGGGT GGCCGTGGGG GTGGAGGTGG TTTAAAAAGT AAGCCAAGCC 
450L AGAGGGAGAG GTCGAGTGCA GGCCGAAAGC TGTTCTCGGG TTTGTAGACG 
4 551 CTTGGGATCG CGCXTGGGGT: CTCCTTTCGT GCCGGGTAGG* AGTTGTAAAG 
4 601 CCTTTGCAAC TCTGAGATCG. TAAAAAAAAT GTGATGCGCT. CTTTCTTTGG 
4 65L CGACGCCTGT TTTGGAAXCT- GTCCGGAGTT AG AAG CTCAG* ACGTCCACCC 
47 OL CCCACCCCCC GCCCACCCCC TCTGCCTTGA ATGGCACCGC CGACCGGTTT 
4r75L CTGAAGGATC TGCTTGGCTG* GAGCGGACGC TGAGGTTGGC AGACACGGTG 
4801 TGGGGACTCT GGCGGGGCTA. CTAGACAGTA CTTCAGAAGC CGCTCCTTCX 
48 5L AACTTTCCCA CACCGCTCAA ACCCCGACAC CCCCGCGGCG GACTGAGTTG 
4901 GCGACGGGGT CAGAGTCTTC; TGGCTGAAAG. TTAGATCCGC TAGGGGTCGG" 
495L CTGCCTGTCG"- CTAG AAG CAT TATTTGGCCT CTCGGAGACC* CGTGTGGAGG: 
5001 AAGTGCTGGA*. GTGTGCGAGT. GTGTTTGCGT GTGTGTGTGT GTGTGTGTGT 
50 5 L GTGT G TGTGTT GTGTGTGTGT GTGCGCGCGC CCXTGGAGGG TCCCTATGCG 
51*01 CTTTCCTrTTT CATGGAACGC TGTCGTGAGG CTTTGGTAAA CTGTCTTTTC 
515*1 GGTTCCTCXC TCGGCTGCACT TTAAGCTTTG7 TCGGCGCXGT AAAGAGACGC 
520 1 GTCTTCAAGT GCACCCTGATT CCTCAGGCTT CAGATAACCCr gtccccgaac: 

525H ctggccagat: GCATTGCAcnr gcgcgccgca ggtagagacgt tgccccacgt 

5301. CCCCTGCGTG". CAGCGACEAC GACCGAGAGC CGCGCCAGTC TGGTGTCCCG" 

53 5X CCGAGAGnC CTCAGAGCAGT GCGGGGACAA- CTCCCAGACG: GCTGGGGCTC: 
54 OL. CAGCTGOSGG" CCCGGAGGTT GGCCZCGCTC' GCAGGGGCTG" GACCCAGCCG 

54 51 GGGTGGGAGG ATCGAGGAGG" GGCGGGCGGG CTCTTCGGTG*. AGTGGGGCGG 
5501 GGCCTCTGGG* TCCACGTGAC* TCCTAGGGGC TGGAAGAAAA ACAGAGCCTG 
5551 TCTGCTCCAG AGTCTCATTA TATCAAATAT CATTTTAGGA GCCATTCCGT 
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5601 AGTGCCATTC GGAGCGACGC ACTGCCGCAG CTTCTCTGAG CCTTTCCAGC 

5 651 AAGTTTGTTC AAGATTGGCT CCCAAGAATC ATGGACTGTT ATTATGCCTT 

5701 GTTTTCTGTC AGTGAGTAGA CACCTCTTCT TTCCCTTCTT GGGATTTCAC 

5751 TCTGTCCTCC CATCCCTGAC CACTGTCTGT CCCTCCCGTC GGACTTCCAT 

58 01 TTCAGTGCCC CGCGCCCTAC TCTCAGGCAG CGCTATGGTT CTCTTTCTGG ■ 

58 51 TCCCTGCAAG GCCAGACACT CGAAATGTAC GGGCTCCTTT TAAAGCGCTC 

5901 CCACTGTTTT CTCTGATCCG CTGCGTTGCA AGAAAGAGGG AGCGCGAGGG 

5951 ACCAAATAGA TGAAAGGTCC TCAGGTTGGG GCTGTCCCTT GAAGGGCTAA 

6001 CCACTCCCTT ACCAGTCCCG ATATATCCAC TAGCCTGGGA AGGCCAGTTC 

6051 CTTGCCTCAT AAAAAAAAAA AAAAAAACAA AAAACAAACA GTCGTTTGGG 

6101 AACAAGACTC TTTAGTGAGC ATTTTCAACG CAGCGACCAC AATGAAATAA 

6151 ATCACAAAGT CACTGGGG C A GCCCCTTGAC TCCTTTTCCC AGTCACTGGA 

6201 CCTTGCTGCC CGGTCCAAGC CCTGCCGGCA CAGCTCTGTT CTCCCCTCCT 

6251 CCTGTTCTTA ACCAGCTGGA AGTTGTGGAA ATTGGGCTGG AGGGCGGAGG 

6301 AAGGGCGGGG GTGGGGGGGT GGAGAAGGTG GGGGGGGGGG AGGCTGAAGG 

6351 TCCGAAGTGA AGAGCGATGG CATTTTAATT CTCCCTCCNC CTCCCCCCTT 

64 01 TACCTCCTCA ATGTTAACTG' TTTATCCTTG AAGAAGCCAC GCTGAGATCA 

64 51 TGGCTCAGAT AGCCGTTGGG ACAGGATGGA GGCTATCTTA TTTGGGGTTA 

6501 TTTGAGTGTA AACAAGTTAG ACCAAGTAAT TACAGGGCGA TTCTTACTTT 

6551 CGGGCCGTGC ATGGCTGCAG CTGGTGTGTG TGTGTGTAGG GTGTGAGGGA 

6601 GAAAACACAA ACTTGATCTT TCGGACCTGT TTTACATCTT GACCGTCGGT 

6651 TGCTACCCCT ATATGCATAT GCAGAGACAT CTCTATTTCT CGCTATTGAT 

6701 CGGTGTTTAT TTATTCTTTA ACCTTCCACC* CCAACCCCCT CCCCAGAGAC 

675L ACCATGATTC CTGGTAACCG AATGCTGATG GTCGTTTTAT TATGCCAAGT 

6801 CCTGCTAGGA GGCGCGAGCC ATGCTAGTTT GATACCTGAG ACCGGGAAGA 

(58 51 AAAAAGTCGC CGAGATTCAG GGCCACGCGG GAGGACGCCG CTCAGGGCAG 

6901 AG CCATG AG C TCCTGCGGGA CTTCGAGGCG ACACTTCTAC AGATGTTTGG 

69 51 GCTGCGCCGC CGTCCGCAGC CTAGCAAGAG CGCCGTCATT CCGGATTACA 

7001 TGAGGGATCT TTACCGGCTC CAGTCTGGGG AGGAGGAGGA GGAAGAGCAG 

7051 AGCCAGGGAA CCGGGCTTGA GTACCCGGAG CGTCCCGCCA GCCGAGCCAA 

7101 CACTGTGAGG AGTTTCCATC ACGAAGGTCA GTTTCTGCTC TTAGTCCTGG- 

715L CGGTGTAGGG' TGGGGTAGAG" CRCCGGGGCA GAGGGTGGGG GGTGGGCAGC 

7201 TGGCAGGGCA AGCTGAAGGG GTTGTGGAAG CCCCCGGGGA AGAAGAGTTC 

7251. ATGTTACATC AAAGCTCCGA GTCCTGGAGA CTGTGGAACA GGGCCTCTTA 

7301 CCTTCAACTT TCCAGAGCTG" CCTCTGAGGG TACTTTCTGG" AGACCAAGTA 

7351 GTGGTGGTGA TGGGGGAGGG GGTTACTTTG GGAGAAGCGG' ACTGACACCA 

7401 CTCAGACTTC TGCTACCTCC CAGTGGGTGT TCTTTAGCTA TACCAAAGTC 

7451 AGGGATTCTG CCCGTTTTGT TCCAAAGCAC CTACTGAATT TAATATTACA 

7501 TCTGTGTGTT TGTCAGGTTT ATCAATAGGG GCCTTGTAAT ACGATCTGAA 

7551 TGTTTCCTAG CGGATGTTTC TTTTCCAAAG ■ TAAATCTGAG TTATTAATCC 

7601 TCCAGCATCA TTACTGTGTT* GGAATTTATT TTCCCTTCXG- TAACATGATC 

7651 AACAAGGCGT GCTCTGTGTX tctaggatcg: CTGGCGAAAX GTTTGGTAAC. 

7T01 ATACTCAAAA GTGGAGAGGG AGAGAGGGTG* GCCCCTCXTT. TTCTTTACAA 

7751 CCACTTGTAA AGAAAACTGT ACACAAAGCC AAGAGGGGGC TTTAAAAGGG 

78 01 GAGTCCAAGG. GTGGTGGAGTL AAAAG AGTTG ACACATGGAA. ATTATTAGGC 

78 51 ATATAAAGGA GGTTGGGAGA TACTTTCTGT CTTTGGTGTT: TGACAAATGT 

7901 gagctaagtt: ttgctggttt gctagctgct ccacaactct. GCTCCTTCAA 

7951- ATTAAAAGGC: ACAGTAATTTT CCTCCCCTTA- GGTTTCTACr ATATAAGCAG. 

8001. aattcaacca- ATTcrGCTAT' tttttgtttt: tgtttcttgt: ttttgttttg: 
8051. TTTGGTrrrr rrrrrriTr r rrn iTr m : gtctcagaaa- agctcatggg: 
810L ccrrTTcrrr tcccctttca actgtgccta.'gaacatctgg: agaacatccc 

815L AGGGACCAGTT GAGAGCTCTC CTTTTCGTTir CCTCTTCAAC CTCAGCAGCA 

8201 TCCCAGAAAA TGAGGTGATCl TCCTCGGCAG* AGCTCCGGCT CTTTCGGGAC 

S251 CAGGTGGACC AGGGCCCTGA CTGGG AACAG" GGCTTCCACC GTATAAACAT 

3 3 01 TTATGAGGTT ATGAAGCCCC CAGCAGAAAT GGTTCCTGGA CACCTCATCA 

' 33 51 CACGACTACT GG ACACCAGA CTAGTCCATC ACAATGTGAC ACGGTGGGAA 
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TGAGCCCTGC AGTCCTTCGC 
CTGGCCATTG AGCTGACTCA 
GCATGTCAGA ATCAGCCGAT 
AACTCCGCCC CCTCCTGGTC 
TTGACCCGCA GGAGGGCCAA 
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TACCTGGATG AGTATGACAA 
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TATCTTAAAA AAAAAAAAAA 
GAAAGACAGA AAAGAAAAAA 
TTATGACTTT ACGTGCAAAT 
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TGGACCCGGG AAAAGCAACC 
CCTCCACCAG ACACGGACCC 
C GTTAC CTCA AGGGAGTGGA 
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GCCGTCGCCA TTCACTATAC 
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TTCCACCACAAGAGGGCAGCCATCTCTAAAAA^ 

TTCAGAGAAATIX3CWCCAAACTTGAG 

AGCAGCACCTCTCTOGGCTACAAAAAAGAAG 

TCGAGTAACTCTCCAGAGGCATCCATTrc^ 

AGGATATCCTAAAOGGCCAAACTCTCTCITCTX^TGTTCCAGAGGCCCAA 

AGCTCCAAGGCATTGTTGATCTCATC^ 

TCTTGGGGTTGGTCCAACAGCTGTCA^ 

ACTTTCTCATTTAAATCTCATATAGGTrCGGAGTT^ 

TCCGCCTCCGCGATGACAGAAGCAATGGTTAACITCT 

TAGGGAAGGAAATGGCTTC AGAGG CGATCAGCCC1TTTSACTTA 

TACACGTCIXIAGTGGAGTGTTTTATTGCC^ 

TTCAGAGTCACAACTTCTGCAACACGTTITA 

ATCGCAAATTGCTCGATCTATCCCTTCCTCTCCTTTAATTTCCCTTO 

ACAGCCTTCCITCAAAAATACCTTATTTGACCT 

GCCAG<^CCTAATTTCCCTCTGTGGGTTGCT 

AACCTAGAGTTATTTTAGCTCCCCGACTGAAAAGCTAGCACACGTGGG 
AAAAAATC^TTAAAGCCCCTGCTTXriXXJTCTTTCT 

AAACTGGAAAGATCTGGTTCACAACGTAACGTTATTCA^^ 
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GGACCAGGCAGAAAATTCAAAGGTCTOUULCCGGAATICT 

GACTCTGGAGTAGGTGGGTGTGGAAGGGAAGATAAATATCACAAGTATCG 

AAGTGATCGCTTCTATAAAGAGAATTTCTATTAACTCTC^ 

ACATGGACACACACACACACACACACACACAC^ 

G GGATGT CCACTTTACAAGTGTGTATCTATGTTCAGAAACCTGTAC 

ATTTITATA ATIT ACATAAATAAATACATATAAAATATATCCAT CriUTr 

ATTAGATTCATTTATTTGAATATAAATGTATCAATATTTA 

TAATGCACTCAGATGTGTATCGGCTATTTCTCG^ 
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GCCCGCCACCCAGATCTTCGCTGCGCCCTTGCCCGGACACG^ 

ACGATGGCTGCCCCX3AGCCATGGGTOGOGGCCCAGCrAACGCAGAAOGTC 

CX3TCCCTCGCCCGGCGAGTCCCGGAGCCAGCCCCGCGCCCCGCCAGCX3CT 

GGTCCCTGAGGCCGACGACAGCAGCAGCCTTGCCTCA^ 

GTCCOGGCCCCGCACTCCTCCCCCTGCTCGAGGCIGTGTCTCAGCACTTG 

GCTGGAGACTTCITGAACTTGCCGGGAGAGTGACTTGGGCTC^ 

GCGCCGGTGTCCTCGCCQ3GCGGATCC 
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